In October 1920, military forces dispatched by Ibn Saud Abd-al Aziz, the soon-to-be monarch of the Kingdom of Saudi Arabia, defeated a Kuwaiti contingent outside of the city of Jahra, about twenty-five miles west of Kuwait City. One month earlier, Ibn Saud had claimed the lands of Kuwait as Saudi territory, much as he and his ancestors had considered the entire eastern coast of the Arabian Peninsula as their dynastic patrimony. The Kuwaiti commander appealed for British support, and the subsequent arrival of warplanes, a short but effective bombardment from warships on patrol in the Gulf, and the appearance of a detachment of Royal Marines prompted Saudi withdrawal (Abu-Hakima Reference Abu-Hakima1983, 134). Over the next three decades, British intervention would repeatedly block Saudi efforts to annex the five principalities of the eastern littoral—Kuwait, Bahrain, Qatar, the Trucial States (the United Arab Emirates after 1971), and Oman (henceforth the GCC-5). Without British intervention, the GCC-5 would, in all likelihood, be Saudi provinces, not sovereign countries (Macris Reference Macris2010, 27).
Our contention is that the survival of the GCC-5 as sovereign states constitutes previously undiagnosed survivorship bias, a form of endogenous selection or collider bias. The GCC-5 are paradigmatic rentier states, with enormous wealth derived from their hydrocarbon endowments and stubbornly resilient autocracies. We argue here that claims about the resource curse—the causal relationship between oil and autocracy—may reflect, at least in part, selection bias on the Arabian Peninsula that has generated false positives, artificially inflating the degree of evidentiary support for the resource curse hypothesis.
Our claim is not based on counterfactual speculation; rather, it is derived from a specific causal model about the particular data-generating process that actually produced multiple rentier states on the Arabian Peninsula. The causal model of the data-generating process that we reconstruct from available evidence features a phenomenon called, in graph-theoretical terms, collider bias (Pearl Reference Pearl2000, 17; Pearl and Mackenzie Reference Pearl and Mackenzie2018, 185-86, 197-200; Elwert and Winship Reference Elwert and Winship2014). A collider is a node in a causal graph that has at least two arrows entering it. In the absence of conditioning, a collider blocks the indirect path between its two parent nodes, indicating the absence of probabilistic dependence between them. Conditioning on a collider variable—selecting cases based on values of the collider or one of its descendants—induces probabilistic association between two variables that are otherwise statistically independent of one another. Collider bias is related to the more conventional understanding of selection bias, in which the units non-randomly selected for inclusion in a study do not represent the full variation of the dependent variable (Achen Reference Achen1986; Geddes Reference Geddes1990). Collider bias, however, is a broader phenomenon than a truncated dependent variable: it can occur without any selection on the dependent variable itself and can occur without any deliberate sampling strategy by the investigator. One way that selection bias can be induced by the world is by survivorship bias, or differential rates of survival caused by a process that is itself related, directly or indirectly, to the dependent variable. If survival is a collider variable, then we have observations only for those units that survived; therefore, the data-generating process is itself conditioning on the collider variable, potentially inducing bias.
We intend this paper to make a broad methodological contribution to comparative politics. Standard operating procedure in the cross-national study of politics is to test hypotheses against an “all-countries” data set in which the investigator makes no decisions about sampling. Consideration of the data-generating process is restricted to the econometric properties of the models that will be estimated on this data: the functional form of the model, the distribution of the error term, and possible covariance between errors and regressors. A more expansive notion of the data-generating process, in contrast, would ask questions about how the data is produced, in terms of conceptualization and measurement, but also in terms of the construction of units. Before we can test hypotheses using data sets built on pre-existing units, we may need to consider how those units came into existence to ensure we are not inadvertently introducing selection bias.
We proceed as follows. The first section locates the problem of survivorship bias within the literature on exogenous borders as natural experiments. We do not simply claim that in some cases sovereignty is endogenous to a prior cause. Using foundational ideas from graph theory, we discuss the structure of collider bias and offer a specific causal model that would imply survivorship bias on the Arabian Peninsula. The second section reconstructs the actual data-generating processes that yielded survivors and failures on the Arabian Peninsula as well as in other global regions. We conclude that our model of survivorship bias describes the emergence of sovereign states on the Arabian Peninsula but not elsewhere. We are not, in other words, “cherry-picking” cases that conveniently comport with our theoretical priors. The third section corrects for survivorship bias on the Arabian Peninsula and the false positives it generates. We justify our decision to estimate two sets of statistical models, one that contains the GCC-5 as sovereign states and another that represents the counterfactual implication of what would have happened without British intervention. As expected, we find that our correction for survivorship bias and the false positives it generates pushes the absolute magnitude of estimated regression coefficients towards zero and prevents us from rejecting a null hypothesis. We conclude with a plea for more research that is sensitive to broader conceptualization of the data-generating process and the types of bias it might induce.
Sovereignty and Survivorship Bias
A great deal of recent scholarship exploits seemingly arbitrary territorial or jurisdictional borders separating otherwise similar units that are exposed to treatment on one side of the border but not on the other. Insofar as the existence or placement of the border corresponds to “as-if” randomization, using borders to assign units to treatment or control constitutes a natural experiment that promises unbiased estimates of causal effects (Dunning Reference Dunning2012). Whether populations on either side of the border are statistically equivalent to one another rests entirely on the claim that the borders were arbitrarily drawn. McCauley and Posner (Reference McCauley and Posner2015, 411; see also Keele & Tituniuk Reference Keele and Titiunik2016) report, however, that many studies merely allege that African borders were arbitrarily drawn without careful reconstruction of the underlying assignment process.
Establishing the arbitrary nature of borders often requires careful qualitative reconstruction of the historical process of border creation. To explain variations in French resistance activities during World War II, for example, Ferwerda and Miller (Reference Ferwerda and Miller2014) argue that the demarcation line between German-occupied and Vichy France followed an arbitrary course at the local level without regard for preexisting administrative borders; different structures of authority in the two regions of France thus act as an “as if” randomly assigned treatment in the vicinity of the border. Reexamining the assignment process, however, Kocher and Monteiro (Reference Kocher and Monteiro2016) conclude that the demarcation was anything but arbitrary, for it followed the major railroads running from Germany to the Atlantic coast. Acts of resistance were more numerous in the occupied zone relative to Vichy France precisely because a major element in Allied strategy was to destroy German railroads and hence their capacity to mobilize forces for a counterattack.
If truly exogenous borders are inferentially valuable, endogenous borders would appear to be a nuisance, contributing little to unbiased inferences but otherwise safely ignored, as when cross-national statistical analysis uses an “all-countries” data set. Yet prior to a country having borders, it must become a sovereign state, and sovereignty itself is subject to powerful selection pressures. As Tilly (Reference Tilly1990) pointed out in a powerful critique of his own earlier work, studies of European political development that work retrospectively from existing sovereign states are plagued by survivorship bias because the vast majority of European states and statelets—more than 500 in 1490, even more in earlier centuries—simply disappeared. Theorizing about political development would have to take into account the causes of differential rates of survival. In a parallel argument, Spruyt (Reference Spruyt1994) rejects theories positing unilinear evolution from pre-sovereign political entities to contemporary sovereign states. These theories, he argues, are based on the historical experience of a small number of Western European countries; consequently, they commit the fallacy of affirming the consequent because they focus exclusively on the “winners” of a centuries-long process of selection in which multiple forms of political organizations were feasible alternatives (Spruyt Reference Spruyt1994, 18).
Yet even after recognizing survivorship bias, it is not always clear how to correct for it. Despite his acute insight into the problem of survivorship bias, Tilly still based his bellicist theory of state formation on a small number of seemingly paradigmatic cases: England, France, Brandenberg-Prussia, and the Dutch Republic loom large in his accounts, while he spends little time on large states that failed to compete successfully, such as Burgundy, Hungary, and Poland. To correct for survivorship bias, Abramson (Reference Abramson2017) has painstakingly assembled a data set describing the entire universe of European political units between 1100 and 1790. Contra the claims of bellicist theory, which predicts that selection will favor larger political units able to deliver more soldiers and tax revenue, Abramson finds that a log-normal transformation of the measure of state size to fit the pattern of outliers (in the untransformed data, the mean is above the seventy-fifth percentile) shows that over this period, the typical state actually declined in size and that smaller states were more likely to survive.
It would be helpful at this point to give a more formal treatment of survivorship bias as a special case of endogenous selection bias, also known as collider bias. Figure 1 depicts a simple causal graph with three nodes. The absence of an arrow between the treatment T and the background condition Z denotes their statistical independence. Arrows exit both nodes and enter the outcome variable, Y, denoting a relationship of statistical dependence of Y on T and Y on Z. In the language of graph theory, Y is thus a collider variable, which blocks the transmission of probabilistic dependence between T and Z. For a unit selected at random, information about Z would not alter the probability distribution of T.
One of the key insights of graph theory is that conditioning on a collider variable unblocks the non-causal path T—Y—Z and hence induces an association between T and Z (Pearl Reference Pearl2000, 17; Pearl and Mackenzie Reference Pearl and Mackenzie2018, 185-86, 197-200; Elwert and Winship Reference Elwert and Winship2014). To see why, consider a simple model, depicted in figure 2.Footnote 1 As represented, the variables talent and beauty are statistically independent of one another, and both have a causal relationship with the outcome variable, celebrity status. The box around the outcome variable represents our conditioning on that variable: in our example, suppose we examined only celebrities. Given the assumptions of our model, we will induce a probabilistic relationship between beauty and talent, as a celebrity who is not beautiful must be talented, such that P(beautiful | no talent) is not equal to P(beautiful). A simple simulation, depicted in figure 3, shows the implications of conditioning on a collider. In the bottom panel, representing the full sample, beauty and talent are independently distributed; the two top panels, representing conditioning on celebrity status, show a negative relationship between beauty and talent among non-celebrities on the left and among celebrities on the right.
Survivorship bias is a specific form of collider bias that does not require selection on the dependent variable. In the left panel of figure 4, C is a collider without conditioning; it thus blocks the non-causal path T—C—Z—Y and hence faithfully represents the statistical independence of T and Z as well as between T and the descendant of Z, the outcome variable Y. In the right-hand panel, the collider variable is represented by survival; because we do not observe units that do not survive, we implicitly condition on survival and hence unblock the path T-survival-Z-Y creating spurious association between T and Y. The classic example of survivorship bias occurred during World War II when, based on observations of damaged aircraft, recommendations were issued to increase the armor on bombers, recommendations that were scrapped when the statistician Abram Wald pointed out that researchers examined only those aircraft that survived and hence must have sustained damage to areas of the plane’s structure that did not result in the plane’s failure; by implication, extra armor should be added to regions of the plane’s structure where damage could not be observed because the plane crashed.
One important implication is that survivorship bias is not a standing condition that can be easily ascertained by simple observation: it is always relative to a specific causal model, and hence identifying it requires careful attention to contextual details relevant to a particular theoretical model. By way of example, we contend that survivorship bias is implicated in studies of the political resource curse. There is an enormous literature on the consequences of oil wealth for economic growth, civil war, state institutions, and political regimes. We focus here on the relationship of oil to democracy; specifically, the more precise form of the theory proposing that, especially in the period after 1980, oil-rich autocracies have a lower likelihood of making a transition to democracy (Ulfelder Reference Ulfelder2007; Ross Reference Ross2012; Andersen and Ross Reference Andersen and Ross2014; Wiens, Poast, and Clark Reference Wiens, Poast and Clark2014; Wright, Frantz, and Geddes Reference Wright, Frantz and Geddes2015; Houle Reference Houle2018). There is also evidence of substantial regional heterogeneity; oil does not appear to have inhibited Latin American transitions to democracy, for example (Ross Reference Ross2012; Ahmadov Reference Ahmadov2014).
We contend that studies of the political resource curse may contain substantial endogenous selection bias in the form of survivorship bias. Because the rules by which history differentially selects survivors cannot be extracted directly from the data, it becomes “essential to discover the processes by which these data are produced” (King, Keohane, and Verba Reference King, Keohane and Verba1994, 135). Figure 5 depicts our model of survivorship bias on the Arabian Peninsula.Footnote 2 This is a local model of sovereignty, inductively derived from studying the region’s history. We know that there are multiple causal processes that have led to sovereign nations in Europe and in the post-colonial world over the past few centuries. We thus make no claim about the generality of the model; section two provides the historical evidence justifying the model and demonstrates its contingent and local nature by failing to find other cases consistent with the model.
Note that figure 5 is structurally analogous to the graphs in figure 4, except that in figure 5, survival is a descendant of the collider variable, protection. Conditioning on the effect of a collider has the same consequences as conditioning on the collider itself (Elwert Reference Elwert and Morgan2013, 251), as the descendant (survival), contains information about both oil and policy that has been encoded in the collider variable, protection. The causal model in figure 5 contains no arrow between oil and autocratic resilience; we omit this arrow to represent our theoretical priors consistent with the null hypothesis: as we discuss later, a substantial and persuasive literature attributes the durability of the GCC-5 autocracies to highly specific features of their political institutions. Thus, figure 5 implies that our cross-national data sets include cases for which observations of an association between oil and autocratic resilience may be spurious; this bias may be of sufficient magnitude that it exercises a large effect on our overall assessment of the oil-autocracy relationship. In effect, we claim that history may have “over-sampled” small, oil-rich monarchies.
Figure 5 also omits an arrow from oil to British policy. We think this omission is well justified, and that even were we to include this arrow, it would not substantially alter our conclusions. Although the British were motivated by the need for secure access to cheap oil, that need did not directly imply a policy of protecting small principalities, and they did not adopt this policy in general. Furthermore, even were we to include an arrow, we would create the causal path oil→British policy→autocratic resilience, which is not consistent with any existing theory of the resource curse and would not resolve the problem of survivorship bias caused by conditioning on a collider variable.
Evaluating the Causal Model and Assessing Its Uniqueness
Our causal model makes three claims about British intervention and the survival of the GCC-5: (1) it was motivated by the goal of acquiring secure access to cheap oil; (2) it prevented these polities from being absorbed into the expanding Saudi state; and (3) it created the institutional foundation for long-term autocratic durability. The first two claims imply the counterfactual that absent British intervention, the GCC-5 would not exist as sovereign countries; the third claim implies that, conditional on their survival, the observed association of oil wealth and autocratic durability is potentially a false positive with respect to the resource curse. We use qualitative evidence about historical processes to probe the credibility of these claims comprising our causal model.Footnote 3
Beginning in approximately 1700 and premised on the need to safeguard transportation and communications with colonial India, British officials entered into treaty arrangements with local leaders of principalities along the eastern coast. These arrangements constituted forms of indirect rule, transforming principalities into quasi-protectorates in name, if not in law. One constant across the centuries was that British officials were indifferent to which power held dominion over the Gulf, as long as that power complied with British interests (Yapp Reference Yapp and Cottrell1980). For example, when in the 1830s, an emerging Saudi state extended its control over the eastern coast, British officials rebuffed pleas from their local allies to intervene and protect their independence from Saudi aggression, even countenancing Saudi control over the strategically vital port of Muscat (Goldberg Reference Goldberg1986, 18).
Britain’s resolute refusal to intervene in local politics ceased with the onset of the oil age. In June 1912, upon becoming First Lord of the Admiralty, Winston Churchill wrote to Admiral John Fisher, “You have got to find the oil; to show … how it can be purchased regularly and cheaply in peace; and with absolute certainty in war” (Black Reference Black2011, 38). Prior to World War I, Britain’s search for cheap and secure oil focused primarily on Mesopotamia and Iran (Monroe Reference Monroe1981). But as Germany gained an early advantage in obtaining an exclusive concession to explore for Mesopotamian oil reserves, British officials grew wary of negotiating with the rulers of the Ottoman and Persian dynasties, learning the painful lessons that, as one official put it, “In a highly centralized theocracy … every big trade concession is regarded as an Imperial favor to be bestowed on the seemingly friendly, a category in which, needless to say, we are not included” (Black Reference Black2011, 33).
Consequently, on the eve of World War I, British officials turned to the small Gulf principalities, where their informal empire would allow them to negotiate exclusive agreements with weak and divided rulers in the highly likely event that oil was discovered there (Heard-Bey Reference Heard-Bey1982, 295). Beginning in October 1913, Britain obligated the leaders of the small principalities along the Gulf to entrust the development of their oil fields only to representatives of Great Britain (Longrigg Reference Longrigg1954, 26-27). A letter from Mubarak al-Sabah to the British Resident obligated the Kuwaiti ruler to disclose to the British Admiralty “the place of bitumen in Burgan and elsewhere and if in their view there seems some hope of obtaining oil therefrom we shall never give a concession in this matter to anyone except a person appointed from the British Government” (Abu-Hakima Reference Abu-Hakima1983, appendix 4, 19). British officials concluded similar agreements with the Al-Khalifah rulers of Bahrain in 1914, the Al-Thani rulers of Qatar in 1916, and rulers of the Trucial States and of Oman in the first few years after World War I (Khalifa Reference Khalifa1979, 22).
The major threat to Britain’s hegemony over the eastern seaboard and access to its future oil supplies was the territorial ambitions of the Al-Saud dynasty. On two occasions in the nineteenth century, Saudi states conquered the peninsula, from the eastern shoreline to the holy cities of Mecca and Medina on the western coast. On both occasions, external intervention decisively terminated the Saudi quest to become a regional hegemon (Lustick Reference Lustick1997). The third effort to build a Saudi state stretching from coast to coast began in 1902, became dormant during the war, but was reinitiated in 1920 with Ibn Saud’s effort to conquer Kuwait. In light of their new oil interests in the Gulf, British officials could no longer be indifferent to Saudi expansion (Macris Reference Macris2010, 24-27).
British intervention to deter the Saudi military conquest of Kuwait in 1920 did not extinguish Saudi efforts to dominate and absorb coastal principalities. One strategy pursued to great effect by Ibn Saud was to form alliances with dissident members of ruling dynasties and nomadic tribes to gain control over the hinterlands, from where he would then gradually encroach on the ruler’s authority in coastal cities. This strategy would at minimum reduce the principality to a de facto city-state; at maximum, it would be the first step towards indirect rule and even annexation and de jure incorporation. By 1930, Ibn Saud was the de facto suzerain of much of the coastline, prompting British officials to claim that they controlled the “front door” to the principalities, but not their “back door” (Zahlan Reference Zahlan1979, 81-82).
A complementary strategy was to contest the location of borders, expanding the area under Saudi control so that it incorporated major oil fields. The question of borders was inextricably bound up with oil, because a company that signed a concession for oil exploration and exploitation needed to know the exact boundary of that concession (Leatherdale Reference Leatherdale1983, 221). Without internationally recognized borders, Ibn Saud could tell Aramco geologists who located oil-rich regions, “You tell me the areas that interest you and I will tell you if it is mine” (Morton Reference Morton2013, 76-85). By the middle of the 1930s, Ibn Saud claimed the vast majority of the territory where major oil fields would soon be discovered, thus threatening the entire edifice of British imperial control of the eastern seaboard (Kelly Reference Kelly2018, 3-4; Abdallah Reference Abdallah1978, 65).
The struggle over borders and oil fields continued over the next two decades, culminating with the Saudi military occupation of an oil-rich region straddling the border of Abu Dhabi and Oman. An increasingly desperate Foreign Secretary Harold MacMillan argued to the cabinet in mid-1955 that “If we were to allow the Saudis to impose a major defeat on us, the whole of our position [in the Gulf] might easily slip away” (Morton Reference Morton2013, 175). The result was Operation Bonaparte in October 1955, a British-planned invasion which recaptured Saudi-occupied territory but caused a diplomatic rift between Britain and the United States that further weakened Britain’s hold over its informal empire.
Throughout the early decades of the twentieth century, British officials were well aware that their intervention was crucial to the ongoing independence of the principalities. As early as 1905, Sir Percy Cox, the British Political Resident for the Gulf, warned of Ibn Saud’s potential to dominate the entire coast and thus threaten key British interests, repeating his warning in 1913, as Ibn Saud concluded his conquest of the eastern coastline between Kuwait and Qatar (Abdallah Reference Abdallah1978, 175). “I have no doubt that Bin Saud could eat up Qatar in a week,” the British Political Resident reported in the mid-1920s, “and I am rather afraid that he may do so” (Zahlan Reference Zahlan1979, 59). As late as 1933, British officials feared that they had “lost Kuwait,” and it was only continued British intervention that allowed Kuwait to survive (Tetreault Reference Tetreault1991, 573; Zahlan Reference Zahlan1978, 75). Contemporary historians of the Gulf agree that without British protection, Ibn Saud would have faced little resistance in his efforts to absorb the entire lower coast (Commins Reference Commins2012, 153; Zahlan Reference Zahlan1978, 81-87). Liou and Musgrave (Reference Liou and Musgrave2014, 1596) posit that one cannot conceive of a plausible counterfactual history of the Gulf monarchies without oil; we have demonstrated that without oil, these monarchies would have ceased to exist.
Having committed to preserving the independence of the five principalities, the British were driven by inexorable logic to safeguarding rulers from domestic threats as well; without maintaining signatories in power, their secure access to cheap oil might suddenly become insecure yet at the same time, their protection of local rulers threatened moral hazard (Onley Reference Onley2009). In the 1920s, therefore, British agents intervened to end violent, internecine intra-dynastic struggles for power as brothers, sons, and nephews challenged rulers for primacy. Especially as Ibn Saud attempted to instigate intra-family dissidence, Britain supervised domestic politics more closely and intervened more readily to ensure political stability (Zahlan Reference Zahlan1978, 34-38). Overt British intervention ended a series of succession crises in nearly every coastal principality; in Bahrain, Britain even forced the abdication of Shaykh Issa Ali al-Khalifa of Bahrain, in favor of his son, Hamed, who was viewed as being more accommodating with British demands (Al-Tajir Reference Tajir1987, 45-46; Davidson Reference Davidson2008, 21-22 and Davidson Reference Davidson2009, 30; Fromherz Reference Fromherz2012, 76; Kinninmont Reference Kinninmont and Davidson2011, 33; and Morton Reference Morton2016, 132).
As battles over dynastic succession were ended, Britain next worked to increase the effectiveness and scope of local administrative structures (Zahlan Reference Zahlan1978, 57). In Bahrain, the British resident centralized power in newly created organs of the state and stripped authority from traditional bodies, as in the abolition of the tribal court system and the creation of a modern court system, and placed British agents in charge of the main sources of revenue (Al-Tajir Reference Tajir1987, 53-103; Gause Reference Gause1994, 20; Kinninmont Reference Kinninmont and Davidson2011, 34). The model was then extended to the other principalities, with British administrators taking control of new bureaucratic agencies, commanding the new police and armed forces that were manned by formerly rebellious but now loyal tribesmen, and collecting and disbursing tax revenue (Gause Reference Gause1994, 23). Members of the royal family quickly came to dominate the “political” ministries, especially the armed forces and the police, leaving the “technocratic” ministries for trained specialists from outside the family. The net result was a system featuring stable dynastic politics and a central state that had gained dominance over the tribes, creating a potentially self-sustaining equilibrium into the future (Khuri Reference Khuri1980).
We contend that British intervention endowed local principalities with the institutional foundations of durable autocratic rule. Gandhi and Przeworski (Reference Gandhi and Przeworski2007) argue that monarchies are durable forms of autocracy because rulers can rely on kin networks to mitigate potentially potent threats from regime insiders, obviating the need for high-cost institution building. Menaldo (Reference Menaldo2012, Reference Menaldo2016) contends “the monarchy effect” is unique to the Middle East, where monarchies have developed distinctive political cultures that allow rulers to make credible commitment, cementing support coalitions to the ruler and sharply mitigating threats to autocratic survival. Herb (Reference Herb1999) distinguishes dynastic monarchies that exploit the solidarity of kinship networks to staff key positions in the government and preempt succession crises from non-dynastic monarchies that exclude family members: in the Middle East, Herb demonstrates that only dynastic monarchies have survived over the long term. We contend that these dynastic monarchies are concentrated on the Arabian Peninsula in large part due to British intervention and reform in the 1920s and 1930s, decades prior to the large-scale exploitation of oil. The GCC-5 were thus not garden-variety monarchies: because of British intervention, they became monarchies on steroids, capable of suppressing diverse sources of political instability.
This historical evidence is all consistent with the causal model of figure 5: motivated by a search for oil, British policymakers intervened at two levels, once to protect the principalities from Saudi expansion and a second time to create the institutional foundations of durable monarchies. Theory and evidence imply, therefore, that survivorship bias on the Arabian Peninsula generates false positives, in which the association of oil and autocratic durability may represent a non-causal relationship. Before correcting for this bias, however, we must consider whether there are other instances that correspond to the causal model and that might represent either additional false positives or false negatives.
We believe, however, that we are unlikely to discover more cases, since the causal model of sovereignty encodes a great deal of contextually specific conditions and events that are unlikely to be replicated elsewhere. We can narrow our search by eliminating two paths to sovereignty: those that ensued in sovereign states prior to the onset of the oil age, and those that were arguably “as if” by randomization. In either case, our causal model with its implication of survivorship bias cannot possibly describe the path to sovereignty. Latin America represents the first path of pre-oil sovereignty: the contemporary map of Latin America was delineated in the aftermath of early nineteenth-century wars of independence, and contemporary borders largely overlap with Spanish colonial administrative units such as viceroyalties, captaincies, and audiencias. Several projects to create large federated states – Gran Colombia, the United Provinces of South America, and the Federal Republic of Central America—splintered into smaller sovereign nations, with relatively minor post-independence border adjustments (Alesina, Easterly, and Matuszeski Reference Alesina, Easterly and Matuszeski2011, 252).
African colonial borders are conventionally understood to have been delineated by arbitrary processes during the late-nineteenth century “scramble for Africa,” most notably during the 1884–1885 Berlin Conference when seven European powers agreed upon their zones of influence. Intent on preserving the status quo and preventing further intra-European conflict over African territories, European diplomats drew borders without taking into account local conditions, with limited knowledge of local geography, and without in any way forecasting future independence: one consequence of this relative ignorance is that 44% of the borders created after the Berlin Conference followed straight lines (Englebert, Tarango, and Carter Reference Englebert, Tarango and Carter2002, 1096). To a large degree, these arbitrarily drawn borders remained unchanged after independence, with 80% of borders tracing lines of latitude or longitude (Alesina, Easterly, and Matuszeski Reference Alesina, Easterly and Matuszeski2011, 246). Therefore, our causal model cannot be taken as a general model of African transitions to sovereign nationhood.
Yet recent scholarship has demonstrated that Africa’s borders are not completely arbitrary. In a small number of cases, colonizers with access to more detailed information tailored borders to preserve the unity of cultural groups, to correspond to the borders of pre-colonial polities, or to accommodate requests of local chiefs (McCauley and Posner Reference McCauley and Posner2015, 411; Englebert, Tarango, and Carter Reference Englebert, Tarango and Carter2002, 1096-97). The size and shape of African states appear to reflect colonizers’ rational revenue-maximizing decisions as well, as their size and likelihood of having straight borders are both negatively related to population density and the density of trade (Green Reference Green2012). Boundary-making may have been a response to levels of pre-colonial political centralization, as more powerful African polities were better able to resist colonial partition (Englebert, Tarango, and Carter Reference Englebert, Tarango and Carter2002, 1097). Finally, and directly relevant to our search for candidate instances of survivorship bias, colonial powers made decisions about the internal organization and hence boundaries of territories within their zone of control. These considerations lead us to inquire further about two pre-sovereign units that arguably experienced moments of potential sovereignty: the Niger Delta region of Nigeria and the Angolan province of Cabinda. Had either of these two oil-rich provinces remained independent, their influence might have partially offset the false positives generated on the Arabian Peninsula.
By the mid-nineteenth century, the Nigerian Delta was a major producer of palm oil, used to lubricate industrial machinery. To protect its commercial interests, Britain declared a protectorate over the Niger Delta region in 1885. In 1906, with assistance from the Colonial Office, the Nigerian Bitumen Corporation began to explore for oil in the Southern Protectorate. “You can well imagine,” a senior official in the British Admiralty wrote in 1912, “how satisfactory it will be to secure a good supply of oil for the Navy from a British company [in a British colony] only about twelve days steaming from home” (Carland Reference Carland1985). Yet shortly after this expression of good fortune, the search for oil was ended without any major discoveries, in large part because Churchill was convinced that the Middle East would provide all the oil the Navy needed. Two years later, the Southern Protectorate was amalgamated with the Northern Protectorate and the Lagos Crown Colony into the single administrative unit of Nigeria (Carland Reference Carland1985).
Had the Niger Delta remained independent, with the discovery of its major oil reserves in the 1950s, “Nigeria” would have been divided into one large and oil-poor country and one small and oil-rich country. Yet the “failure” of the Southern Protectorate to achieve independence had nothing to do with either the expectation of its oil riches or the initial failure to find them. The British created a single, unified Nigeria in 1914 for both administrative and financial reasons: to take advantage of the economies of scale afforded by a single, rationalized administration and to use the Southern Protectorate’s budget surplus to finance Northern Nigeria’s budget deficits. This amalgamation, moreover, had been anticipated in a report issued in 1898, prior to the initiation of oil exploration (Carland Reference Carland1985).Footnote 4 These facts are all inconsistent with our causal model that implies survivorship bias.
A second plausible candidate for survivorship bias is the Angolan province of Cabinda, an enclave entirely surrounded by the Republic of Congo and the Democratic Republic of Congo. While Portuguese colonization of Angola dates back to the sixteenth century, Portuguese control over the Cabinda enclave dates only to the 1885 Treaty of Salimbuca, by whose terms the three resident Kingdoms of Cabinda recognized Portuguese sovereignty in exchange for protection. In 1956, Portugal integrated Cabinda into Angola, triggering the formation of a Cabindan independence movement in the early 1960s. The incorporation of Cabinda into Angola, moreover, coincided with the discovery of large, offshore oilfields that supply more than half of Angola’s total oil production.
The “failure” of Cabindan independence, however, appears unrelated to its oil endowments. From 1885 until 1956, generations of Portuguese administrators considered Cabinda an integral part of Angola and continuously experimented with different administrative schemes that would overcome the problem of Cabinda’s geographic separation (Martin Reference Martin1977, 54-55). Furthermore, after the rise of the fascist Estado Nuovo in 1926, Portuguese officials made concerted efforts to tighten their political and economic control over their African colonies. Viewed in historical perspective, the 1956 formal incorporation of Cabinda into Angola was simply an effort to consolidate control in an era of growing decolonization by other European powers (Duffy Reference Duffy1963, 157, 191-92). That oil was being discovered in very small quantities at the same time appears to have been entirely coincidental; it was not until the 1960s that Portuguese economic planners began to envision mineral-based development in Angola.
To the best of our knowledge, only Brunei bears even superficial resemblance to our causal model of sovereignty. According to Mukoyama (Reference Mukoyama2020), on two occasions, Britain intervened to endorse the independence and eventual sovereignty of Brunei. The first occasion was in 1906, when a British Company, the North Borneo Chartered Company, made demands for territorial concessions. The second occasion was in 1961 when Malaya’s prime minister invited Brunei to join the Malayan Federation. We do not consider the evidence adduced by Mukoyama sufficient to establish, however, that British policy was based primarily on concerns for advantageous access to Brunei’s oil wealth or that Brunei would have disappeared in the absence of British intervention. If we were to accept Brunei as a case of survivorship bias, moreover, it would only lead to a stronger case that existing data sets have over-sampled oil-rich, monarchical autocracies.
Finally, to further establish the uniquely local character of our transition to sovereign statehood, we can point to two cases of “failure” in the Gulf region that are inconsistent with the causal model. In the early twentieth century, Muhammarah was a Persian city and surrounding countryside at the head of the Arabian Gulf, just a few hundred miles from Kuwait. Oil was found near Muhammarah in 1908, and by 1912, British officials began negotiating to lease part of the island of Abadan as a base for oil refining and shipping (Rutledge Reference Rutledge2014, 18). A decade later, following the downfall of the Qajar Empire, Shaykh Khazal, who had been only nominally a vassal of the Persian Shah, asked to become an independent polity under British protection. British officials refused, instead allowing Reza Shah Pahlavi to establish central-state hegemony over the region and its oil reserves. Two conditions distinguished Muhammarah from Kuwait and the other Arab principalities. First, Persia’s legally recognized claim to sovereignty, codified by the 1847 Treaty of Erzurum, constrained British officials, who publicly recognized that Shaykh Khazal was a Persian subject. Second, an independent Muhammarah contradicted Britain’s larger security policy that dictated the maintenance of Persian sovereignty and integrity as a buffer against Russian expansion (Waalkes Reference Waalkes1996, 162-63).
The Kingdom of the Hijaz, on the western coast of the Arabian Peninsula, represents a second case of “failure” to achieve sovereign statehood. The local ruler, Sharif Husayn, declared his independence from the Ottoman Empire in 1916 and played a critical role in Britain’s wartime strategy in the Middle East; after the armistice was signed, Britain remained fully committed to Husayn “by obligations both moral and political” as Lord Curzon proclaimed (Paris Reference Paris2003, 353). Yet in the mid-1920s, as Saudi expansion threatened the independence of the Kingdom, the British opted to not deter Saudi westward expansion (Teitelbaum Reference Teitelbaum2001). In the monographic literature, the standard answer for the British withdrawal of their protection stresses Husayn’s growing unreliability as an ally, his penchant for adopting maximalist and intransigent positions—a “pampered and querulous nuisance,” as one observer put it—on boundary negotiations, his increasingly arbitrary and autocratic rule, and a growing concern that he was no longer compos mentis (Kostiner Reference Kostiner1993; Paris Reference Paris2003, 255; Alangari Reference Alangari1998, 205-8; Troeller Reference Troeller2013). These arguments imply that the failure to become a sovereign country stemmed from contingent reasons not covered by our causal model. However, the evidence contained in these works is not sufficient to fully rule out the alternative explanation that the British were uninterested in an independent Kingdom of the Hijaz because, unlike the principalities of the eastern seaboard, it did not sit atop enormous oil reserves (Rutledge Reference Rutledge2014, 65).
In summary, the available evidence is consistent with our model of the path to sovereignty among the GCC-5, but that model does not appear to apply elsewhere. We have reason to believe that with respect to the political resource curse, the path to sovereignty of GCC-5 represents a unique manifestation of survivorship bias.
Testing the Resource Curse Hypothesis in Light of Survivorship Bias
We have provided a causal model that implies survivorship bias and provided evidence that the model accurately describes the path to sovereignty for the GCC-5; we have searched for but not found evidence of other manifestations of collider bias. Two implications follow. First, in the counterfactual absence of oil-induced British intervention, the GCC-5 would in all likelihood have been absorbed into an expanding Saudi Arabia. Second, because British policy also established the institutional foundations of durable monarchies among the GCC-5, our causal model contains no direct or indirect causal effect of oil on autocratic durability; our contention is that any observed association in these cases is, at least partially, the product of collider bias. By implication, statistical models that do not control for endogenous selection potentially over-estimate the effect of oil on regime durability. Given that the GCC-5 are extreme outliers on univariate distributions of oil wealth and autocratic durability, our primary expectation is that when we control for survivorship bias, the absolute magnitude of the coefficients on variables measuring oil wealth should be substantially closer to zero. Our claim about the consequences of collider bias does not imply that these coefficients will be statistically indistinguishable from zero, though this may follow from the smaller absolute magnitude of the coefficients.
We cannot correct for collider bias using two standard techniques: selection models and regional dummy variables. Selection models require that a subset of cases in our data are not selected into the final sample; that is not true for any cross-national data set that includes only countries selected into sovereignty. Statistical models that include a regional dummy variable, or even a dummy variable for the GCC-5, would not correct for collider bias; adding control variables to a multivariate model is an appropriate solution only for bias induced by common causes, or omitted variable bias.
Our correction for survivorship bias is to compare two states of the world: the actual world in which the GCC-5 are sovereign countries and a counterfactual world, uncorrupted by British intervention constituting a data-generating process that induces survivorship bias, in which Saudi Arabia annexed the entire eastern Persian Gulf seaboard. We estimate identical statistical models on both data sets representing two different data-generating processes and compare their results. Our claim is that British intervention created the institutional foundations for durable monarchies which will be highly resistant to regime failure. Therefore,
H1: Using conventional data sets that do not correct for selection bias, higher levels of oil abundance or dependence will be associated with lower probabilities of autocratic regime failure.
H2: Using data sets that correct for selection bias by representing the counterfactual data generating process, higher levels of oil abundance or dependence will not be associated with substantially lower probabilities of autocratic regime failure, as the absolute magnitude of estimated coefficients on oil wealth will be substantially closer to zero.
We use two measures of oil wealth. To capture the concept of oil abundance, we include the variable oil & gas income, the value in current U.S. dollars of a country’s annual per capita oil and natural gas production (Ross and Mahdavi Reference Ross and Mahdavi2015). We take the natural logarithm of this measure, added to one-tenth of one penny.Footnote 5 To represent oil dependence, we include the variable rent leverage, which captures the share of oil and natural gas income in per national capita income (corrected for purchasing power-PPP). Rent leverage measures the influence that state-controlled oil revenues have over an average citizen’s daily economic life (Smith Reference Smith2017). Note that while oil & gas income is a continuous variable ranging from zero to a maximum value of $77,000, rent leverage is a ratio constrained to lie between zero and one. Therefore, models estimated using rent leverage tend to yield coefficients that are more consistently significant statistically and also substantively larger than models based on oil & gas income.
We measure autocratic regime spells and failures using the Autocratic Regimes Data Set compiled by Barbara Geddes, Joseph Wright, and Erika Frantz (Reference Geddes, Wright and Frantz2014) and used by the same authors to estimate oil’s effect on intra-autocratic and autocratic-democratic transitions (Wright, Frantz, and Geddes Reference Wright, Frantz and Geddes2015). By coding both types of transitions that could follow autocratic failure—a transition to a new autocracy or a transition to democracy—this data set better captures autocratic regime durability than other data sets that code a binary variable, democracy or autocracy, and thus fail to observe intra-autocratic transitions. Our argument is about autocratic durability, not about democratic transitions; however, autocratic regimes that never fail are also regimes that never undergo a democratic transition. The online appendix includes models estimated using a data set (Cheibub, Gandhi, and Vreeland Reference Cheibub, Gandhi and Vreeland2010) that codes dichotomous democratic transitions, with results similar to those we report here.Footnote 6
We estimate models using a minimal set of control variables that past scholarship has found to be highly predictive of autocratic failure: GDP per capita (PPP), annual GDP per capita growth, and the number of neighboring democracies.Footnote 7 Andersen and Ross (Reference Andersen and Ross2014) conclude that 1980 represents a major discontinuity in the international oil sector, such that the effect of oil should be discernible only after 1980. We thus create a dummy variable for all years after 1980, including it in the model independently and in interaction with our measures of oil wealth (Brambor, Clark, and Golder Reference Brambor, Clark and Golder2006). We estimate models with a minimal set of control variables because our main objective is to correct for endogenous selection bias by comparing the results of models using two different data sets produced by two distinct data-generating processes. Estimating minimal models is also consistent with best practices that warn against the arbitrary inclusion of controls (Achen Reference Achen2002; Clarke Reference Clarke2005).
We estimate survival models of autocratic failure because these models do not make any assumptions about “right-censored” regime spells (those that have not experienced failure when observations end) and because they are sensitive to changing hazard ratios across time. We allow for multiple spells within countries and use Cox proportional hazards conditional risk set models.Footnote 8 Models using logistic regression are included in the online appendix. We estimate two sets of models, with each set representing the alternative data-generating process. The first set of four models corresponds to the actual data-generating process; these baseline models comport with conventional hypothesis tests of the resource curse that do not correct for selection bias. In these baseline models, we first estimate two models using rent leverage followed by two models using oil & gas income. The first model of each pair excludes the post-1980 interaction term, the second includes that term. We then repeat the entire exercise using the counterfactual data set; in these models, we use the variables counterfactual RL and counterfactual O&G. Because many scholars are accustomed to scan regression tables searching for symbols of statistical significance, we repeat that our hypothesis test involves the comparison of models representing different data-generating models: our hypothesis is that the absolute magnitude of estimated coefficients will be substantially closer to zero when using the data-generating process that corrects for selection bias.
Table 1 presents the results. Models 1–4 correspond to the conventional wisdom. Looking at row four (rent leverage) and row 5 (oil & gas income), we see that greater oil wealth is statistically associated with a reduced risk of regime failure. Compare these results to those of models 5–8 representing the counterfactual data-generating model. Looking at row six (counterfactual rent leverage) and row seven (counterfactual oil & gas income), we see estimated coefficients whose absolute magnitudes have been substantially reduced towards zero, by several orders of magnitude in models 5 and 6, and by about one-third in models 7 and 8. It is not simply that these coefficients are statistically indistinguishable from zero (which is the case for three of the four models); more importantly, they are nearly indistinguishable from zero in substantive terms.
Note: *** p<0.01, ** p<0.05, * p<0.10. Analysis is by Cox proportional hazards conditional risk set (time from the previous event) models. Multiple autocratic breakdowns are allowed and stratified by breakdown order. Coefficients are represented. Robust standard errors in parentheses. RL Counter—Rent Leverage Counterfactual; OG Counter—Oil/Gas Income Counterfactual. The coefficients of GDP are eighth decimal points.
Figures 6 and 7 visualize the effect of correcting for selection bias. In both panels of figure 6, representing the “actual” data generating process, the effect of oil on survival is large and grows larger over time. Figure 7 shows the opposite. In the left panel, the two survival functions are indistinguishable, while the right panel shows a very small effect—compare the right panels of figure 7 and figure 6—that does not change over time.
There are two large lessons to extract from this analysis. First, we need to expand our sensitivity to different forms of biased causal inferences. Most commonly, one thinks about spurious associations derived from omitted variables—in graph-theoretic terms, these are common causes of treatment and outcome. A full repertoire of research designs exist to mitigate omitted variable bias, and there is a growing recognition of the need to distinguish between bias induced by uncontrolled common causes and post-treatment bias, an entirely separate problem which results from controlling for a mediator rather than a common cause (Paine Reference Paine2016; Montgomery, Nyhan, and Torres Reference Montgomery, Nyhan and Torres2018). A second form of bias is measurement error, which has also become a vital area of methodological and applied research (Blackwell, Honaker, and King Reference Blackwell, Honaker and King2017). A third form of bias is endogenous selection bias. We call this collider bias to underscore that it can be induced without any form of selection on the dependent variable. When confronting threats of biased inference, many of us act on the instinct that more control is better than less; but, counter-intuitively, conditioning on a collider induces bias that would not exist in the absence of conditioning.
The second major lesson is to think broadly about the data-generating process. Narrowly understood, the data-generating process is a summary statement about our regression assumptions. We urge a more expansive view, even a literal one: data are generated by processes, or linked events. They are the consequence of decisions and actions. Some of these decisions and actions are made by researchers, such as decisions about measurement. Others are made by non-academic observers, who decide which subjects to observe and what behavior to record; our data sets may not contain the information we need precisely because of decisions made by the originators of the data (Knox, Lowe, Mummolo Reference Knox, Lowe and Mummolo2020). Still other decisions and actions are made by the people we study, whose actions may destroy old institutions and create new ones. We have shown that such actions quite literally are the data-generating process because by granting sovereignty to some entities, denying sovereignty to other entities, and delineating the borders between them, they literally constitute the countries that enter into cross-national data sets and many of their key features. In some circumstances that cannot be defined a priori, endogenous selection bias may result. This is true for the emergence of the modern European state; it appears to be true for at least some transitions from pre-sovereign entities to sovereign countries. It may be true for sub-national borders and jurisdictions as well.
The cutting edge of research must therefore expand to include more than statistical techniques and research designs; it must include context-specific knowledge which is vital for generating and testing new causal models of data-generating processes. We should no longer tolerate any friction between “area specialists” and generalists; causal inference is too hard to permit any impediments towards the combination of skills and knowledge sets.