Identifying Patterns in the Structural Drivers of Intrastate Conflict

Abstract Quantitative methods have been used to: (1) better predict civil conflict onset; and (2) understand causal mechanisms to inform policy intervention and theory. However, an exploration of individual conflict onset cases illustrates great variation in the characteristics describing the outbreak of civil war, suggesting that there is not one single set of factors that lead to intrastate war. In this article, we use descriptive statistics to explore persistent clusters in the drivers of civil war onset, finding evidence that some arrangements of structural drivers cluster robustly across multiple model specifications (such as young, poorly developed states with anocratic regimes). Additionally, we find that approximately one-fifth of onset cases cannot be neatly clustered across models, suggesting that these cases are difficult to predict and multiple methods for understanding civil conflict onset (and state failure more generally) may be necessary.

Civil wars cause humanitarian crises and change the course of state development, with generational effects (Acemoğlu and Robinson 2012;Black 2013;Englehart 2009;Herbst 1996). As these events can be so catastrophic, many have sought to better understand the drivers of civil war to anticipate their onset, using a range of methodologies (Fearon, Kasara, and Laitin 2007;Goldsmith and Butcher 2018;Goldstone et al. 2010;Hegre et al. 2017). Quantitative predictive methods have shown that models are able to generate better-than-random predictions about future conflict onset, an important finding for researchers and policymakers (Goldstone et al. 2010;Hegre, Nygård, and Landsverk 2021), though temporal variability in predictive accuracy should contextualize model application (Bowlsby et al. 2019).
However, an examination of onset cases used in quantitative models shows that civil war cases are caused by a wide range of factors (Marshall, Gurr, and Harff 2021). For example, the UK experienced civil conflict in 1971 driven by state-led discrimination against Irish Catholics by Ulster paramilitaries and British armed forces (Loyle, Sullivan, and Davenport 2014). Contrast this with the roles of agitated youth and regional instability in the 2011 Egyptian Revolution (Salih 2013) or the 1996 insurgency in Nepal, partly attributable to high rural inequalities (Murshed and Gates 2005).
While global quantitative models have demonstrated utility in predicting civil war onset, the wide variation in drivers suggests that there may be unique patterns or risk profiles that are common to some, but not all, onset cases. It may be that there is not one set of universal structural factors driving instability, but alternative sets of factors that drive vulnerability, depending on a particular constellation of factors. Rather than identifying the "gas on the rag" (O'Brien 2002), it may be the case that there are different types of flammable substances that lead to civil war onset that are particular to levels of development, or a confluence of factors unique in some cases but not others, rather than universal factors that explain all failures.
In this article, we use cluster analysis to identify patterns in factors that may correlate with civil war onset. We ask: is there evidence that persistent and unique clusters of drivers of civil war exist, calling into question the use of "single models" to predict onset? If we find no persistent driver clusters, that provides further evidence that "single model" approaches to predicting onset should continue to be the focus of research in this field, possibly strengthening their prominence in both research and policymaking communities.
We begin by creating a "base" model, using agglomerative clustering on six theoretically motivated variables that have been shown historically to predict civil war onset, finding three persistent clusters. We test how persistent these clusters are by using alternative model specifications. We find a persistent cluster of drivers of civil war onset associated with young, poorly developed, and anocratic countries, a grouping of onset cases that is persistent across all model specifications. We also find a cluster of countries with above-average levels of development in consolidated regimes that suffer moderately higher levels of horizontal inequalities. This suggests that wealthier, politically consolidated states more commonly have civil conflict onset associated with minority repression. Our third cluster is characterized by higher neighborhood conflicts, younger populations, and higher levels of horizontal inequalities.

Research Design
Clustering is an unsupervised quantified classification technique that aims to discover patterns within data based on the similarity of attributes between observations. All clustering techniques follow similar steps: (1) scaling input data; (2) calculating the "distance" between observations; (3) using clustering algorithms, which help determine how clusters are constructed based on the "distance" between observations; and (4) using the output of the preceding analyses to identify an appropriate number of clusters. Each step can be completed using several techniques. For our base model, we elected to normalize the data using a min-max, 0-1 scale. Next, we used Euclidean distance to measure the similarity between each onset case in a multivariate space. Euclidean distance calculates the square root of the sum of the squared differences between input variables for any two observation pairs and is particularly appropriate for noncategorical variables (Abbott 2014).
Thirdly, we used an agglomerative hierarchical clustering algorithm (more commonly, "Agnes") to group observations. Agnes clustering identifies local patterns in observations, building from the "ground up" by identifying smaller groups of observations that are then merged into bigger groups at each clustering stage (Kaufman and Kousseeuw 2009). Alongside the Agnes clustering algorithm, we used Ward's (1963) linkage method to determine the distance between clusters because it produces more refined and balanced clusters (Wolfson, Madjd-Sadjadi, and James 2004).
Determining the appropriate number of clusters can be achieved using various approaches, including visual heuristics, quantitative indices, and confirmation through domain knowledge (Abbott 2014; Kodinariya and Makwana 2013). We primarily relied upon an approach that creases thirty indices that evaluate different aspects of model behavior using the NbClust R package (Charrad et al. 2020). 1 The output consistently pointed to the use of three clusters in this analysis.
We used the Political Instability Task Force's (PITF) event data for our onset cases (Marshall, Gurr, and Harff 2021). Our case selection criteria included ethnic wars, revolutionary wars, and genocides/politicide, and excluded cases that were exclusively adverse regime change. 2 We then reduced the overall sample to first-year onset (avoiding conflict continuation years) with a minimum of two years of peace prior to the civil war onset year.
We selected variables measuring theoretically supported drivers of civil war onset, intentionally selecting from a range of competing alternative explanations in the literature: economic development (Buhaug, Cederman, and Gleditsch 2014;Collier and Hoeffler 2004;Van de Walle 2004), human development (Badie 2000;Rotberg 2002;Urdal 2005), demographics (Bricker and Foley 2013;Dyson 2012;Mesquida and Wiener 1999;Weber 2019), governance (Bogaards 2009;Cook and Savun 2016;Dahl 1971;Mansfield and Snyder 1995), neighborhood effects (Buhaug and Gleditsch 2008;Cederman et al. 2013;Wolff 2011), and state-led discrimination/horizontal inequalities (Gurr 1970;Østby, Nordås, and Rød 2009;Reynal-Querol 2002). The sample covers 191 countries from 1960 to 2012 (see Table 1). 3 The final dataset for the base model includes 76 onset cases. 4 We pursued various strategies to test the sensitivity of model assumptions to our results. 5 First, we altered the distance metric to use Gower's (1971) measure of distance, which accounts for mixed-type data inputs. We also tested an alternative clustering algorithm called "partitioning around medoids" (PAM) and tested alternative data, including the Varieties of Democracy (V-Dem) liberal democracy index (Coppedge et al. 2021) and civil conflict onset sample data from the Uppsala Conflict Data Program / Peace Research Institute Oslo (UCDP/PRIO) (Gleditsch et al. 2002;Gleditsch et al. 2021). Finally, we added two additional variables that have been identified as drivers of civil war onset but are less policy-relevant: population size and peace years. This analysis validates our findings. Table 2 and Figure 1 summarize the base model. Radar plots show the clusters' mean value by variable, the full civil conflict onset sample mean value, and the full sample mean for both onset and non-onset cases. 6 The top-left cluster in Figure 1 highlights civil war onset cases where the state has low levels of economic development, high infant mortality, young populations, low levels of state-led discrimination, few neighboring conflicts, and anocratic governance. The onset case that best represents the conditions described in this cluster is Guatemala in 1966, when communist insurgents launched a protracted guerilla conflict against the ruling military regime (Johnson 1966). This cluster is the second largest of the three, containing 39.47 per cent of civil conflict onset cases (N = 30). We describe this cluster as: Anocratic-Younger-Less Developed.

1
Each index evaluates a different number of clusters based on a specific quantitative threshold. This includes the CH index (Caliński and Harabasz 1974) which evaluates cluster validity based on the ratio of the sum of between-cluster dispersion and of intercluster dispersion for all clusters. Gap statistics (Tibshirani, Walther, and Hastie 2001) compare the total intra-cluster variation between observed data and reference data with a random uniform distribution (a distribution with no obvious clustering) for different number of clusters. For further confirmation, we also visualized the suggested number of clusters through a dendrogram and a scatter plot based on the first two principal components of the data (Kassambara and Mundt 2020; Lê, Josse, and Husson 2008). 2 For the full list of PITF cases included in our sample, see Table 1.1 in the Online Appendix. 3 For a detailed description of variables, see Table 2 in the Online Appendix. 4 For descriptive statistics of variables for onset cases, see Table 3 in the Online Appendix. 5 For model and data specifications used for sensitivity analysis, see Table 5 in the Online Appendix. 6 Boundaries of radar plots represent minimum and maximum mean values among all five sample groups, including the full civil conflict sample, the global sample for both onset and non-onset cases, and three identified clusters.
The top-right cluster in Figure 1 includes cases where the state has consolidated governance (either democracy or autocracy), gross domestic product (GDP) per capita above the global mean, low infant mortality, older populations, and more peaceful neighborhoods. However, it is notable that state-led discrimination in these states is somewhat high relative to the global mean, as well as the conflict-onset mean. This cluster is the smallest of the three in terms of onset cases grouped within it (17.11 per cent [N = 13]). The most central conflict case here is China in 2009, where Uyghur minorities launched violent actions against the government and Han Chinese residents in and around the Xinjiang province, such as the Urumqi riots (Hastings 2011). The characteristics of the states in this cluster at the time of their civil conflict onset can be described as: Older-Wealthier-Moderate Discrimination.
The final cluster in Figure 1 is characterized by states with younger populations, severe levels of neighborhood conflict, and very high levels of state-led discrimination. This cluster is the largest of the three that we found among our PITF sample of civil conflict onset cases, containing 43.42 per cent of total cases (N = 33). The most central onset case in this cluster is Sudan in 1983, wherein the Sudan People's Liberation Army in the country's more Christian southern regions launched an insurgent war against the majority Muslim Arab government in Khartoum, which sanctioned genocidal actions by progovernment militias in the region (Scott 1985). This cluster's descriptive characteristics motivate us to describe the group as: Higher Discrimination-Worse Neighborhood-Younger. 7 These three clusters are persistent across various model specifications, with the Older-Wealthier-Moderate Discrimination cluster showing the lowest level of case variation across alternative clustering methods. The Anocratic-Younger-Less Developed cluster was the second most robust across model specifications, followed by the Higher Discrimination-Worse Neighborhood-Younger cluster. 8  More detailed summary statistics can be found in Table 5 in the Online Appendix C. 8 For results of each cluster analysis, see Table 6 in the Online Appendix; for a summary of the persistence of each clustering approach, see Table 7 in the Online Appendix; and for visualizations of the sensitivity analysis, see Figures 1-4 in the Online Appendix.
We also examine how individual onset cases behave across models by comparing the cluster assignment of cases across all eight model specifications. For forty-four of the seventy-six cases (57.9 per cent), there is uniform agreement on cluster assignment, regardless of model specification. Conversely, in 19.7 per cent of cases (fifteen of seventy-six), onset cases do not neatly cluster for at least two out of eight model specifications, indicating broad disagreement (see Table 8 in the Online Appendix). For all robustness and sensitivity checks, see the Online Appendix.

Discussion and Conclusion
Do states experience civil conflicts for similar reasons? Or, are there diverse, yet persistent, sets of conditions under which they arise? We find evidence that persistent clusters do appear to exist but that these are not uniformly applied to all cases. In our analysis, we found that nearly one-fifth of onset cases move substantially across clusters when using different techniques. These hard-to-anticipate cases of civil conflict may be some of the most challenging for policymakers to respond to, as they display few signs of impending danger and can be difficult to react to without prior contingency plans in place. Conflicts like these, which may appear without clear structural warnings, can lead to significant humanitarian crises, destabilizing regions as neighbors are taken by surprise.
However, while not all civil conflict onset cases neatly cluster across single or multiple methods, we can point to persistent patterns of characteristics across model specifications. Such persistence shows that although states do not descend into civil conflict for the same set of reasons, there are common characteristics across geography and time. These clusters may provide new insights for scholars as to the descriptive conditions of fragility and shine a light on why some cases descended into civil conflict, even if they did not appear fragile in the ways highlighted in prior literature.
We find that well-established drivers of civil conflict onset feature prominently in this analysis, but they are contextualized in relationship with other drivers of civil war onset. For example, anocratic states are particularly vulnerable to civil war (Goldstone et al. 2010), and recent research has highlighted how contemporary democratic backsliding could make the United States vulnerable to civil war in the near future (Walter 2022). However, this article shows that anocracy is associated with civil war onset when coupled with such factors as a high youth population in poorly developed states, while civil wars in relatively wealthier states are driven not by mixed regimes, but instead by minority discrimination. Persistent clusters of characteristics across civil conflict onset help demonstrate that failures can be understood not only as multidimensional, but also as neither unique nor uniform in the structural features present when they occur. This finding should lend aid to future works studying state failure, while also building a better understanding of the dynamics of these events and encouraging more robust efforts at anticipating them.
The next generation of civil war studies should be sensitive to the limitations of single global models to predict onset, which may be improved with better understanding of concurrent structural drivers. If states do, in fact, fail for distinct reasons, future studies may benefit from modeling distinct "risk profiles" and blending multiple methods of exploration to further unpack the assorted nuanced explanations for civil war onset. As we find that while some onset cases cluster neatly and consistently, a large portion (about 20 per cent) are more difficult to classify, suggesting that researchers exploring civil war should expect the unexpected. Future work may wish to keep this finding in mind, as it could affect both prediction and policy prescriptions for anticipated conflicts.
We recognize that the study of civil conflict onset will remain a complex and contentious space. Using slowly moving structural factors to predict rare events will raise substantive research issues related to causality, endogeneity, and levels of analysis. While these challenges will persist, we hope to create a space that bridges a gap between rigorous models that predict conflict onset and case-study approaches that trace causal mechanisms. We hope this "liminal" space can be used creatively to further the study of conflict onset to reduce its prevalence and destructive capacity in the future.