Contagion, Confounding, and Causality: Confronting the Three C’s of Observational Political Networks Research

Abstract Contagion across various types of connections is a central process in the study of many political phenomena (e.g., democratization, civil conflict, and voter turnout). Over the last decade, the methodological literature addressing the challenges in causally identifying contagion in networks has exploded. In one of the foundational works in this literature, Shalizi and Thomas (2011, Sociological Methods and Research 40, 211–239.) propose a permutation test for contagion in longitudinal network data that is not confounded by selection (e.g., homophily). We illustrate the properties of this test via simulation. We assess its statistical power under various conditions of the data, including the nature of the contagion, the structure of the network through which contagion occurs, and the number of time periods included in the data. We then apply this test to an example domain that is commonly considered in the context of observational research on contagion—the international spread of democracy. We find evidence of international contagion of democracy. We conclude with a discussion of the practical applicability of the Shalizi and Thomas test to the study of contagion in political networks.


Introduction
Contagion has been found to characterize, for example, individuals' decisions to vote (e.g., Bond et al. 2012;Rolfe 2012), the emergence of civil conflicts across countries (e.g., Maves and Braithwaite 2013), and the spread of democracy across countries (e.g., Starr 1991). It is, however, well known that inferences regarding contagion can be confounded by other dynamics that lead connected units to behave in similar ways (Franzese, Hays, and Kachi 2012). Shalizi and Thomas (2011) formalize and analyze the problem of inferring contagion in the presence of homophily. Contagion refers to the influence connected units have on each other, whereas homophily refers to the tendency for similar units to be connected due to their common traits. The arguments presented by Shalizi and Thomas (2011) apply to any type of dependence of connections on units' traits (e.g., heterophily, whereby dissimilar units tend to form ties)-generally referred to as "selection." We follow their terminology and use "homophily" as synonymous with selection. As a running example, we focus on the spread of democracy across countries. The spread of democracy is a question of contagion versus homophily; do connected states influence each other to develop democratic institutions or do similarly governed states tend to be connected to each other over time? It is also possible that a state's governing choices are a result of an unquantified blend of contagion and homophily. The methods we present and illustrate in this note allow the researcher to test for contagion in a way that is not confounded by the presence of homophily.
Given the need to estimate contagion separately from network homophily, it is important to recognize the circumstances where these two effects are confounded. Shalizi and Thomas (2011) explore this idea in detail, specifically considering the problem of identifying contagion in observational longitudinal network data. They analyze the problem within the causal diagram framework (Pearl 1995) and show that in observational social network data, latent homophily-tie formation that is attributable to unmeasured attributes of units-and contagion are "generically confounded" and cannot be identified without strong parametric assumptions. It is helpful here to build conceptual bridges between what Shalizi and Thomas (2011) refer to as "latent homophliy" and conceptual characterizations of the confounding of influence and homophily in political science, specifically by Franzese, Hays, and Kachi (2012). Franzese, Hays, and Kachi (2012) distinguish between two tie formation mechanisms that confound contagion inferences-common exposure, which occurs when an exogenous variable effects both tie formation and the outcome variable, and endogenous selection (or behavior homophily/heterophily), which occurs when tie values at one timepoint depend on outcome values from previous timepoints. Although Shalizi and Thomas (2011) do not explicitly model endogenous selection, the identification problems presented by endogenous selection are equivalent to those presented by latent homophily. Considering our running example, suppose a researcher sought to model the spread of democracy through diplomatic networks (Duque 2018). Latent homophily would confound inferences if, for example, the researcher failed to measure any important cultural, geographic, economic, or security factors that shaped diplomatic relations between countries and future regime type developmentsincluding countries' histories of regime type developments. More broadly, as political networks research commonly focuses on the factors that explain tie formation (e.g., Minozzi et al. 2020), we suspect that the presence of latent homophily in political network data is quite prevalent.
Shalizi and Thomas (2011) present a few ideas regarding how to make inferences on contagion in observational dynamic network data despite the presence of latent homophily. One of these is a permutation test that requires minimal assumptions regarding the structure of contagion and no assumptions regarding the structure of homophily. Specifically, since the test relies on associations across time-lagged data, it must be assumed that contagion does not completely manifest and then dissipate within a single time period-that the contagion effects persist for more than at least one time period. The test does not condition at all on any network structure, and does not rely on any assumptions regarding the structure of homophily/selection. In this paper, we implement this permutation test and show, through simulation, that it provides a sensible first step to uncover the presence of contagion in longitudinal social network data. We illustrate the use of the test on the dynamics of contagion of democracy, for which we find evidence.

Shalizi and Thomas Test
Shalizi and Thomas (2011) present a test for contagion that does not condition on the ties between nodes (units). The process is to randomly permute the nodes in a social network into two groups (J 1 and J 2 ), and estimate the relationship between the outcome variable in one group and the time-lagged counterpart of the other group while controlling for the time-lagged outcome of the current group. By iterating over all possible (or a large number of) partitions, and averaging over all iterations, "there will be a nonzero predictive ability if and only if there is actual contagion," in the social network. While the power of this test is low when the time series is short, the random partition of nodes into bins assures that the analysis is not confounded by conditioning on ties (i.e., two-node groups) that are themselves potentially formed according to homophily.
The steps of the test are as follows: 1. Given longitudinal network data, randomly partition the nodes into two bins, J 1 and J 2 . 2. Aggregate, by, for example, taking the mean of the outcome variable Y over all the bin nodes, at each time step, resulting in an aggregated time series ofȲ J 1 (t ),Ȳ J 1 (t − 1), . . .,Ȳ J 1 (1) for bin J 1 and time seriesȲ J 2 (t ),Ȳ J 2 (t − 1), . . .,Ȳ J 2 (1) for bin J 2 .

Estimate the relationship betweenȲ
We use ordinary least squares regression, but other estimators could be used. 4. Repeat Steps 1-3. The total number of partitions possible for equal bin sizes is n n/2 . 5. The test for contagion is conducted by calculating empirical p-values with respect to the distribution of estimated relationships betweenȲ J i (t ) andȲ J k (t − 1). A left(right)-tailed p-value is given by the proportion of estimated relationships that are less(greater) than zero.
Intuitively, this test is designed to detect a diffuse contagion signal whereby, due to the presence of contagion between some of the nodes in the two randomly partitioned groups, the aggregated values across the two groups are not independent. This indirect form of signal detection is necessary to avoid conditioning the contagion estimate on the network structure, which activates the confounding presented by latent homophily.
The Shalizi and Thomas test is a valuable tool in the study of contagion through political networks. It does, however, have a few limitations that are important to note. First, it is a hypothesis test only, allowing one to evaluate the sharp null hypothesis of no contagion. It does not offer estimates of contagion parameters, or even the capacity to test for contagion through specific networks. Second, the contagion signal that the test relies on is the association of outcome values in J 1 (J 2 ) with recent values of J 2 (J 1 ), controlling for recent values of J 1 (J 2 ). The presence of this signal requires that the system embeds memory of the contagion effect. If contagion manifests and then dissipates completely within one time period-something that could happen if the time units are too aggregated-the test will fail to detect a signal. Third, the test can fail in the presence of a form of quasi-contagion that behaves like "interference" as discussed in the experimental literature (Bowers, Fredrickson, and Panagopoulos 2013). If one unit's covariate value affects the outcome value of another unit, this can look, to the Shalizi and Thomas test, like contagion through the outcome variables, but it is actually a more subtle form of cross-unit dependence. To give an example of this dynamic, major policy decisions (e.g., business or trade shut downs due to the COVID-19 pandemic) made in a country may affect the economy of the country making the decision as well as the economies of other countries (Cronert 2022). This dynamic would look like economic contagion to the Shalizi and Thomas test, but it is actually a form of cross-border economic dependence based on policymaking effects.

Simulation
To evaluate the performance of the Shalizi and Thomas test, we conduct a simulation study. 1 We vary the time lengths, homophily and contagion conditions, and network structure, in order to understand the performance of the Shalizi and Thomas test. The test was implemented on four separate simulation conditions-one that includes contagion only, one that includes endogenous homophily only, one that includes both contagion and endogenous homophily, and one that includes endogenous homophily plus a time shock to outcome values. The last condition, the time shock, is included to evaluate the test's performance with time-based common exposure. The time shock is tuned to create, on average, a correlation of 0.5 in the Y values of units at the same time. The data generation models we use are similar to the ones used by Shalizi and Thomas (2011). We set the parameter values to assure that all of the data generating processes result in stationary outcome data. 2 The data generation models are outlined below.
Contagion only data: 1. Begin with n nodes in a network, and each node i is assigned a scalar latent variable X i ∼ U(0, 1).
2. Generate directed ties between every pair of nodes (i , j ) with probability l ogi t −1 (−3|x i −x j |). The smaller the difference between x i and x j , the higher the probability of a tie between i and j. This produces an n × n adjacency matrix A, where A i ,j = 1 represents a directed tie. 3 3. Initiate a starting value for the time series data. We use Y i (0) = 0.25x i + N (0, 0.06 2 ). 4. Given the adjacency matrix from Step 2, contagion incorporated time series data is simulated as Y i (t ) = 0.25x i + 0.3Y i (t − 1) + 0.7Y k (t − 1) + N , where k = 1, . . ., n and A i ,k = 1.
Homophily only data: Figure 1. DAGs for the contagion versus homophily-only data generation models used for the simulation data.

Results
All of the results presented in this paper reflect a Shalizi and Thomas test run comprising of 10,000 partition iterations. The results of our simulation study are presented in Figures 2 and 3. The implementation of the test on the contagion-only data and the contagion-plus-homophily data shows that the test identifies a positive contagion signal. On the homophily-only data and the data with homophily-plus-time shock (presented in the Supplementary Material), the test does not identify a consistent contagion signal, that is, it estimates a signal centered on zero, and actually some negative bias with short time series-a result that is consistent with the negative "Hurwicz" bias that arises with dynamic models fit to short time series data (Franzese, Hays, and Cook 2016;Nickell 1981). In the homophily-only case, the standard deviation of the signal begins to stabilize after around 30 time steps of data.
When there is contagion in the data, the test is more variable the lower the time series lengths. For time series lengths greater than 20 steps, the signal converges to approximately 0.35, with or without endogenous homophliy in the data. In Figure 3, we present summary estimates of the performance of the Shalizi and Thomas test. We summarize the test's performance at the 0.05 and 0.10 (two-tailed) significance levels, and consider both power and Type-1 error. With fewer than 10 time steps, Type-1 error is high, and power is quite low, suggesting that this test should simply not be used with a relatively short time series. As a point of comparison, we estimated the correctly specified regression model on the simulated data using ordinary least squares (shown in the Supplementary material), and found the power to exceed 0.90 even with one timepoint. With more than 10 time steps, Type-1 error is slightly above the nominal significance levels and converges to the nominal levels with a longer time series. Statistical power converges to 1.0 with a long time series.
In addition to the true strength of the contagion effect, we expect the performance of the Shalizi and Thomas test to improve as the network becomes more dense, as density determines the degree to which nodes are subject to contagion effects. The results from the contagion-only simulation at varying levels of network density are presented in the Supplementary Material. With relatively low density (0.05) the signal is weaker, but the signal strength levels out with densities of 0.30 or greater.

Application: The Spread of Democracy
When it comes to contagion dynamics, one of the domains of political science that would benefit most from the use of an observational hypothesis test that is not confounded by selection is the study of the contagion of country-level outcomes across international networks. Examples include the spread of specific policies (Towns 2012), civil conflict (Forsberg 2014), and democracy (Epstein 2005). Due to their national scales and substantial human effects, it is difficult and unethical  to design randomized experiments that would provide design-based tests for dynamics such as conflict or democracy contagion. The necessity of working with observational data in this context underpins the importance of the Shalizi and Thomas test.
We focus on the international contagion of democracy through the analysis of Polity scores. The Polity IV Annual Time Series, 1800-2018 (Center for Systemic Peace, n.d.) is a database that tracks and compiles regime changes and regime authority in countries with a total population greater than 500,000 in 2018. This database is extensively used in political science to study regime changes and effects of regime authority in countries over time. This is also a classic database used to study the contagion of democracy among networked countries over time, and hence the implementation of the Shalizi and Thomas test on this database is highly relevant.
We implemented the test on a 50-year subset of the democracy "democ" score in the Polity IV data from 1969 to 2018 comprising of 118 countries. This democracy index (on a 11-point scale) scores how "institutionalized" democracy is within a nation. While there are different indices that characterize regime type, the democ score is one of the common indices used to study the degree of democracy in countries' governments (Marshall et al. 2002). Due to substantial evidence that the panel of democracy scores is non-stationary, we apply the test to the panel of first differences in democracy scores. The distribution of 10,000 contagion estimates is visualized in the Supplementary Material. The contagion signal was 0.169, and the proportion of estimates under zero (the one-tailed p-value) was 0.005. We find reliable evidence of positive contagion of democracy. This is an important finding, as it compliments and replicates the result from several model-based observational studies that democracy spreads through international networks, but we do not rely on a methodology that requires us to (a) identify the network through which it spreads, or (b) select the other, potentially confounding, factors for which to adjust our estimates.

Discussion
The presence of contagion dynamics in political processes can have substantive implications. For example, the adoption of an innovative policy solution in one state/city/country would lead to innovation elsewhere, democratic reforms in one country could eventually lead to a more democratic future beyond that country's borders. Contagion in political turnout means that get-out-the-vote efforts have effects beyond those voters who are directly engaged by activists. As researchers are often limited in their material or ethical capacities to answer questions about contagion experimentally, we are forced to make inferences with observational data. In this paper, we study and apply a hypothesis test for contagion proposed by Shalizi and Thomas that is not confounded by homophily. The test has limitations, but it is, to the best of our knowledge, the only testing framework that can reliably differentiate contagion and homophily in observational data. We see this test as a compliment to the use of methods that rely on structural parametric assumptions to model contagion (e.g., Snijders 2017), offering researchers a non-parametric robustness check in testing for contagion. In an application to the international contagion of democracy, we reject the null hypothesis of no contagion, and find evidence for the international spread of democracy.