Inference in Linear Dyadic Data Models with Network Spillovers

When using dyadic data (i.e., data indexed by pairs of units), researchers typically assume a linear model, estimate it using Ordinary Least Squares and conduct inference using ``dyadic-robust"variance estimators. The latter assumes that dyads are uncorrelated if they do not share a common unit (e.g., if the same individual is not present in both pairs of data). We show that this assumption does not hold in many empirical applications because indirect links may exist due to network connections, generating correlated outcomes. Hence, ``dyadic-robust'' estimators can be biased in such situations. We develop a consistent variance estimator for such contexts by leveraging results in network statistics. Our estimator has good finite sample properties in simulations, while allowing for decay in spillover effects. We illustrate our message with an application to politicians' voting behavior when they are seating neighbors in the European Parliament.


Introduction
Dyadic data is categorized by the dependence between two sets of sampled units (dyads). For example, exports between the U.S. and Canada depend on both countries (and, plausibly, their characteristics). This contrasts to classical data in the social sciences that only depends on a single unit of observation (e.g., the GDP of the U.S., or a politician's vote in a roll-call).
The empirical relevance of dyadic data is showcased by its widespread use, which has increased over the past two decades (Graham (2020a) provides an extensive review). For example, applications are found in political economy (correlation in voting behavior in Parliament across seating neighbors, Harmon et al. (2019)), international political economy and trade (export-import outcomes across countries, Anderson and van Wincoop (2003)), international relations (Hoff and Ward (2004), for a salient example), among many others. In fact, dyadic data is considered to be dominant in quantitative international relations (Poast, 2016). In these examples, applied researchers typically model the dependence between dyadic outcomes and observable characteristics using a linear model, which they then estimate using Ordinary Least Squares (OLS). However, inference on such estimators for the linear parameters is more complex.
The main approach in recent applied work has been the use of the so-called "dyadicrobust" estimators (e.g., Cameron et al. (2011), Aronow et al. (2015), and Tabord-Meehan (2019), among others). Such estimators build on the widely-used assumption in dyadic data that the error terms for dyad (i, j) and for dyad (k, l) can only be correlated if they share a unit (see Aronow et al. (2015) and Tabord-Meehan (2019) for a discussion; Cameron and Miller (2014) for a review).
In this paper, we first argue that such an assumption does not hold in many applications using dyadic data where dyads may be indirectly connected along a network. 1 Figure 1 presents a simple example in the context of politicians in Congress, whose votes or decisions depend on their seating neighbors. It is completely possible that behavior across dyads (A, B) and (C, D) might be correlated along unobservables because they have many indirect connections (in the figure, through A sitting next to B, who sits next to C). 2 We show that such spillovers invalidate the assumptions for consistency of existing "dyadic-robust" variance estimators through generating interdependence, implying that they are biased for the true asymptotic variance when dyads may be correlated even when they do not share a 1 This is a concrete class of applied examples where the assumption fails. The possibility that crosssectional dependence in dyadic data might be more extensive than assumed has been pointed out by Cameron and Miller (2014) and Cranmer and Desmarais (2016). 2 We expand on these examples in the next section. Such spillovers could be further rationalized as individual-level unobserved heterogeneity: e.g., an unmeasured preference for voting Yes, or a preference for trading with a certain country (see Graham (2020b) and references therein for details). common unit.

Figure 1: Hypothetical Example of a Network in Parliament
Notes: The left figure shows a hypothetical example of politician networks based on seating arrangements: A sits beside B, who sits beside C, who sits beside D. The right-hand figure illustrates the resulting network among active dyads. As dyads (A, B) and (B, C) share a unit, they are indirectly linked in the dyadic network. However, though dyads (A, B) and (C, D) do not have a politician in common, they might still be correlated through two indirect links: namely, B sits beside C, who sits beside D. Hence, D's actions can affect politician A.
To deal with these issues, we develop a consistent variance estimator that explicitly accounts for such network spillovers even with dyadic data, thereby complementing existing approaches (e.g., Aronow et al. (2015)). 3 We prove that our proposed variance estimator is consistent for the true variance of the OLS estimator in linear models with dyadic data when the cross-sectional dependence follows an observed (exogenous) network. Our main insight is that the dependence across all dyads, including indirect spillovers, can be rewritten as correlations across a specific network over dyads. This allows us to apply the framework of Kojevnikov et al. (2021) to such network random variables, although here it is a network over dyads, rather than individuals. Monte Carlo simulations show that our proposed estimator has good finite sample properties and outperforms other estimators for the relevant contexts.
To help practitioners, we then provide a step-by-step guideline on whether our estimator may be appropriate to their context. As we describe, this choice depends on: (i) whether spillovers from indirectly connected dyads are likely to be present, (ii) whether the researcher observes/constructs the network among dyads through which spillovers propagate, and (iii) whether those spillovers are likely to be persistent. Our variance estimator is consistent for the asymptotic variance of the OLS estimator even under (i)-(iii). And our estimator can account for decay in propagation, as Corollary 3.1 and Example 3.1 illustrate. On the other hand, if decay is very high, or spillovers along the network do not exist, then those same results imply that the estimator of Aronow et al. (2015) will also be consistent.
Finally, we illustrate the extent to which neglecting network spillovers with dyadic data may bias inference results. Beyond Monte Carlo simulations, we revisit the application in Harmon et al. (2019) of voting in the European Parliament. The authors study whether random seating arrangements (based on naming conventions) induce neighboring politicians to agree with one another in policy votes. The outcome, whether politicians i and j vote the same way on a policy, is dyadic in nature. However, i and j's votes may be positively correlated even if they are not neighbors: for instance, i and j may sit on either side of common neighbors k and l, who influence them both, and this seating arrangement is observed. This chain of influences is sufficient to induce strong positive correlation across non-dyads. We show that neglecting such higher order spillovers has significant empirical consequences: their estimated variance using the estimator in Aronow et al. (2015) is roughly 22% smaller than using our consistent estimator accounting for such spillovers; while the estimate based on the Eicker-Huber-White estimator ignoring spillovers is approximately 73% smaller than our proposal, consistent with the arguments of Erikson et al. (2014).

Related Literature
The use of dyadic data in Political Science has a rich history, particularly in International Relations. However, empirical challenges with such models are well known -see Poast (2016) for a historical overview. Early on, the concerns were mostly about model specification, including the error term. This includes the 2001 special issue of International Organization, mostly focusing on the use of fixed effects. More recently, Erikson et al. (2014) pointed out that ignoring dependence across dyads can lead to erroneous hypothesis testing, as computed standard errors would be too small. Hoff and Ward (2004) and Minhas et al. (2019Minhas et al. ( , 2022 suggest including random coefficients and latent variables to account for dependencies across dyads. Our approach explicitly accounts for the whole network of interdependencies across dyads, which can go beyond third-order dependences (assumed in Minhas et al. (2019Minhas et al. ( , 2022). It does so by using asymptotic inference, rather than Bayesian (Minhas et al., 2019(Minhas et al., , 2022 or randomized inference (Erikson et al., 2014).
As a result, our paper is directly related to the literature on (asymptotic) inference in regression analysis with dyadic random variables. Aronow et al. (2015) and Tabord-Meehan (2019) consider OLS estimation and inference in a linear dyadic regression model. Meanwhile, Graham (2020a) and Graham (2020b) explore a likelihood-based approach to dyadic regression models, while Graham et al. (2022) and Chiang and Tan (2022) provide results for kernel density estimation in dyadic regression models. 4 While useful to allow for correlations along time and within such groups, this separable structure may be inappropriate for environments where spillovers follow a complex form of dependence along a network. As alluded to previously, all such papers assume that dyads are uncorrelated unless they share a common unit, thereby ignoring dependence across indirectly linked dyads, while our estimator explicitly captures the higher order correlation across dyads along the dyadic network.
However, we emphasize that neither approach subsumes the other. The papers cited above leave the dependence within "clusters" (groups of dyads that share units) unrestricted, but assume independence across such clusters. This is akin to the literature with oneway clustering (e.g., Hansen and Lee (2019)). By comparison, our approach restricts such dependence among groups of dyads that share units (i.e., dependence is assumed to follow the observed network), but allows for dependence across such "clusters" of dyads along the dyadic network. 5 Which approach to pursue using dyadic data depends on the researchers' applications and how they fit such assumptions.

Set-up
Assume that we observe a cross section of N ∈ N individuals located along a network -the latter interpretable as politicians, countries, firms, or other observation units depending on the context. The dyads present in the N -individual network (i.e., among the N N −1 possible dyads), are called active dyads, so that the dyad for two units i and j (e.g., politicians, countries) is denoted as some m. The set of active dyads is denoted M N and M denotes the cardinality of that set.
We assume that each dyad m is endowed with a triplet of dyad-specific variables, forming a triangular array {(y M,m , x M,m , ε M,m )} m∈M N with respect to M , where y M,m ∈ R is a one-dimensional observable outcome, x M,m ∈ R K is a K-dimensional vector of observable characteristics with K ∈ N, and ε M,m ∈ R is a one-dimensional random error term that is not observable to the researcher. We only consider exogenous network formation and the network is assumed to be observable. These conditions are summarized in the following assumption: Assumption 2.1 (Exogenous and Observable Dyadic Networks). The network among dyads is assumed to be conditionally independent of {ε M,m } m∈M N . Furthermore, this network among the N individuals is assumed to be observable.
While such assumptions are standard in models of dyadic networks, they seem particularly appropriate when units or dyad pairs are linked across geographical, physical, or ex-ante social relations (e.g., family ties). This includes capturing neighboring and regional spillovers across countries, as often done in international relations, or exogenous seating arrangements in Parliament, as illustrated in the examples in the next section.
The subsequent arguments require us to distinguish between a pair of dyads who share a member (i.e., who are directly linked -which we call, adjacent) and a pair of dyads who are directly or indirectly linked (which we call, simply, connected ).
Definition 1 (Adjacent & Connected Dyads). Two active dyads m and m are said to be adjacent if they have an individual in common; and they are called connected if they are linked through pairs of adjacent dyads.
In Figure 1, dyad (A,B) is adjacent to (B,C), and connected with, though not adjacent to, (C,D). Hence, the adjacency relationship constitutes a network structure among active dyads, and thus, a network over individuals can be transformed to one over active dyads. For example, the right-hand side panel of Figure 1 provides a network over pairs of voting politicians (i.e., active dyads). 6 We define the geodesic distance between two connected dyads m and m to be the smallest number of adjacent dyads between them. Note that adjacent dyads are a special case of connected dyads with geodesic distance equal to one.

Set-up & Identification
The cross-sectional model of interest takes the form of the linear network-regression model: where Cov(ε M,m , ε M,m | X M ) = 0 unless m and m are connected, and β is a K × 1 vector of the regression coefficients and X M denotes the M × K matrix that records the observed dyad-specific characteristics, i.e., X M := [x M,1 , . . . , x M,M ] .
In this paper, we assume that β is identified, which follows from standard assumptions on strict exogeneity, lack of multicollinearity and the existence of finite second moments of y M,m and x M,m . (For completeness, see Assumption B.1 and Proposition B.1 in Appendix).
We note that equation (2) allows for there to be spillovers across the error terms even when dyads m and m are not adjacent, as long as they are connected through indirect links. By comparison, applied researchers such as Harmon et al. (2019) and Lustig and Richmond (2020) (and the estimators of Aronow et al. (2015) and Tabord-Meehan (2019)) consider a variant of the linear regression (1) under the assumption with m = d(i, j) representing the dyad between i and j. This specific assumption would be equivalent to setting: Cov(ε M,m , ε M,m | X M ) = 0 unless m and m are adjacent.

Examples
Whether to allow indirect spillovers as in (2) or not (as in (4)) depends on the researchers' applications. We now present two examples where our approach may be preferable.
Example 2.1 (Gravity Model of Bilateral Trade Flow). A researcher is studying the trade flow from country i to j, with (log) exports from i to j denoted y ij . Following the literature, (s)he assumes y ij follows the structural gravity equation (e.g., Eaton and Kortum (2002); Anderson and van Wincoop (2003); Melitz (2003); Helpman et al. (2008)): where z ij represents a dyadic characteristic of i and j, such as the shipping cost, whether both countries are democratic (e.g., Mansfield et al., 2000), or whether both participate in WTO/GATT (e.g., Gowa and Kim, 2005); k g ki y ki is the amount i spends on imports (g ki equals one if country i purchases goods from country k and zero otherwise), and η ij captures unobserved heterogeneity pertaining to the trade flow between countries i and j. To see our main point, suppose there are only four countries (1, 2, 3 and 4) which trade, where country 1 exports to country 2, which in turn exports to country 3, and country 3 exports to country 4. Equation (5) then simplifies to: y 12 = α + βz 12 + η 12 , y 23 = α + βz 23 + γy 12 + η 23 , and henceforth. Rearranging these equations implies that the trade flow from country 3 to 4 can be written as: y 34 = α + αγ + αγ 2 + γ 2 βz 12 + γβz 23 + βz 34 + γ 2 η 12 + γη 23 + η 34 .
Therefore, Cov(y 12 , y 34 | z) = γ 2 V ar(η 12 | z) = 0, where z ≡ {z 12 , z 23 , z 34 }. Hence, there can be non-zero correlation between trade flows y 12 and y 34 even if they do not have a country in common. This is because an idiosyncratic shock to an upstream country can propagate through the trade network.
Example 2.2 (Legislative Voting). A researcher is interested in whether seating arrangements in legislatures can affect a politician's behavior, y i (e.g., propensity to vote "Yes" on a roll-call, as Harmon et al. (2019), or the amount of co-sponsoring, as Saia (2018); Lowe and Jo (2021), among others). For concreteness, suppose there are four politicians with the seating arrangements given by Figure 1.
The researchers posit that i's behavior can be influenced by the (average) of its seating neighbors' own voting behavior through a parameter γ as follows: If γ = 0, A is affected by their neighbor B, while B is affected by both of its neighbors (A and C) and so forth. The researcher is interested in whether neighbors' decisions are more highly correlated than the decisions among non-neighbors. Denote y ij as the dyadic outcome of interest (e.g., a measure of correlation between i and j's decisions). Both y AB and y CD involve y B and y C , which are themselves a function of η B and η C . Hence, Cov(y AB , y CD ) = 0, even if the two pairs of legislators do not share a common member.

Estimation
Throughout this paper, we focus on the Ordinary Least Squares (OLS) estimator of β, denoted byβ. Under the assumptions above, we can writê It is straightforward to verify thatβ is unbiased for β under our identification conditions (Assumption B.1). However, a consistency result is by no means trivial due to the dependence along the network which induces a complex form of cross-sectional dependence, hindering a naïve application of the standard theory for independently and identically distributed (i.i.d.) random vectors. This is clarified in Section 3.1. Before that, we provide a heuristic overview of the main point of our paper in this setting.

Inference
Inference about β is based on a normal approximation of the distribution ofβ around β. We focus on hypothesis testing conducted using the expression: where V ar(β) is a consistent estimator of the asymptotic variance ofβ. Our main result in Section 3.4 is providing such an appropriate estimator, which takes the form: where κ m,m is an appropriate kernel function that will formally be defined in Section 3.4; h m,m represents an indicator function that takes one if dyads m and m are connected and zero otherwise; andε m := y m − x mβ . This paper derives conditions under which V ar(β) is consistent for the asymptotic variance ofβ. Before doing so, let us compare the variance estimator (10) with an often used estimator based on one-way clustering of dyad groupings.
Remark 2.1 (Dyadic-Robust Variance Estimator). An increasing number of applied researchers, such as Harmon et al. (2019) and Lustig and Richmond (2020), estimate model (1) and conduct inference, using the following dyadic-robust variance estimators proposed by Aronow et al. (2015) and Tabord-Meehan (2019): where 1 m,m equals one if dyads m and m are adjacent and zero otherwise.
Note that the use of the dyadic-robust variance estimator sets cases in which two dyads are not adjacent, but connected, to zero. Meanwhile, our estimator (10) accounts for network spillovers by accommodating the correlation across both adjacent and connected dyads. 7 As our examples above suggest, the structure of the variance estimator (11) may not be compatible with indirect spillovers in some settings, which should assume the specification (2) instead. This suggests that the dyadic-robust variance estimator may be inconsistent when there is cross-sectional dependence along a network over dyads: i.e., when non-adjacent dyads can still affect the correlation structure and outcomes of dyad m. 8 This conjecture is formally proven in Corollary 3.1 and illustrated in Monte Carlo simulations in Section 4. We note that this is a feature of applying such dyadic-robust variance estimators to network spillovers, and not a feature of those estimators per se.
In order to validate the test statistic (9) for inference, this paper establishes the consistency and asymptotic normality ofβ and develops a consistent network-robust estimator for the asymptotic variance when the correlation structure is given by (2).

A Step-by-Step Guide on Implementation
Researchers interested in the linear specification (1) may find the following guidelines useful when deciding whether to use our proposed estimator (10).
1. The researcher should first ask whether spillovers from indirectly connected dyads are 7 See Definition 1 and the subsequent discussion. The choice of kernel and lag-truncation is discussed in Section 3.4.
8 Clustering estimators may be inappropriate when the correlation structure has network spillovers as in (2), because each agent has a complex (i.e., non-separable) structure of connections, reflected in a nonseparable network across dyads. If the network model features positive spillovers, then the dyadic-robust variance estimator will likely underestimate the true variance, leading to conservative hypothesis testing. Meanwhile, it is likely to overstate the true variance when there are negative spillovers. We expand on this point in our numerical simulations. likely to be present (and not decay immediately) in their set-up: i.e., is equation (2) a more appropriate assumption than equation (4)?
While this depends on the specific application, Examples 2.1-2.2 illustrate models where that is likely to be the case. And condition (17) provides a notion of how much persistence is needed for a bias to appear. As we show below, these insights are robust to decaying spillover effects (see Corollary 3.1, Example 3.1, and the associated simulation results).
2. If such spillovers of unobservables are likely to exist, are they governed by an exogenous and observable network (Assumption 2.1), such as physical, geographical or social (e.g., family ties)? 9 If so, the proposed estimator is appropriate under regularity conditions.
3. One can implement our estimator by: (i) choosing a kernel (e.g., rectangular, see Section 3.4), (ii) setting the lag-truncation, b M (either by a known value, or adaptively by b M = 2 log(M )/ log(max(average degree, 1.05)), where M is the number of dyads and we use the average degree of the dyadic network, (iii) plugging-in those choices into equation (10).
This estimator is consistent under regularity conditions, even when spillovers decay, and shows good finite-sample properties in the simulations below.

Network Dependent Processes
Let Y M,m be a random vector defined as and denote C M := {x M,m } m∈M N . 11 From equations (8) and (12) we can writê 9 Alternatively, one would have to consider recovering unobserved exogenous networks (see ? for an example with panel data) or modeling endogenous network formation, possibly with unobserved links (see ?? for two examples in political economy).
10 By construction, the collection of Y M,m 's constitutes a triangular array of random vectors. 11 For the case of stochastic networks, it is defined to include information about the network topology as well as the collection of the dyad-specific attributes {x M,m } m∈M N . See Kojevnikov et al. (2021).
Our interest lies in proving the asymptotic properties ofβ taking advantage of the expression (13). However, the presence of ε M,m in Y M,m , which is allowed to be correlated along the network over active dyads, renders our approach nonstandard. The canonical results about independently and identically distributed (i.i.d.) random variables are by no means applicable due to the complex form of dependence inherent to ε M,m . Dyadic-robust estimators (e.g., Aronow et al. (2015) and Tabord-Meehan (2019)) only account for correlations between adjacent dyads. This may be appropriate in some settings, but not when there may be indirect spillovers, as in specification (1) -(2). Thence, the right hand side of (13) cannot simply be embedded in other widely-studied variants of dependent random vectors, including spatially-correlated and time-series data.
The main insight of this paper is that the spillovers across connected -even if not adjacent -dyads, can be rewritten as the dependence of Y M,m 's along the network of active dyads (hereby, referred to as the "network"). This transformation allows us to embrace such complex cross-sectional dependence and appropriately rewrite the problem so that recent results on network dependent random variables (Kojevnikov et al., 2021) can be applied. Our results then allow for correlations of random variables given a common shock as explicitly dictated by distances between them on the dyadic network. Our notion of dependence restricts the covariance between the random variables for any two sets of dyads A and B, Y M,A and Y M,B , which are at a distance s from one another, to be bounded and to decrease as s goes to infinity. Indeed, this covariance is controlled by a sequence of (random) coefficients θ M,s , which capture the strength of covariation net of scaling, analogous to correlation coefficients. As the minimum distance, s, between A and B grows, the dependence {θ M,s } between Y M,A and Y M,B , tends to zero. A formal description is provided in Appendix A.

Definitions
As will become transparent shortly, asymptotic theories forβ rest on tradeoffs between the correlation of the network-dependent random vectors (i.e., the dependence coefficients) and the denseness of the underlying network. To measure the denseness, we first define two where ρ M (m, m ) denotes the geodesic distance between dyads m and m . 12 The former set collects all the m's neighbors whose distance from m is no more than s (which we call a neighborhood), whilst the latter registers all the m's neighbors whose distance from m is exactly s (which we call a neighborhood shell). Next, we define two types of density measures of a network: for k, r > 0, where it is assumed that M N (m ; −1) = ∅. The former measure gauges the denseness of a network in terms of the average size of a version of the neighborhood. The latter expresses the denseness of a network as the average size of the neighborhood shell raised by k. Kojevnikov et al. (2021) show that controlling the asymptotic behavior of an appropriate composite of these two measures (denoted by c M and defined in Assumption 3.6) is sufficient for the Law of Large Numbers (LLN) and Central Limit Theorem (CLT) of the network dependent random variables (Condition ND). It is worth noting that there are two different units at play: the number of sampling units (i.e., individuals), N , and the number of dyads, M . To bridge these two units, we assume that the number of active dyads also goes to infinity as the number of sampling units is taken to infinity, eliminating the possibility of extremely sparse networks among individuals. This is empirically relevant. For example, in international trade, the entry of a new country/firm to a market will most likely increase the number of trade flows in the economy; in political economy, the more members of parliaments (MEPs) there are, the more pairs of the MEPs sitting next to each other there will be (see Section 5). 13 Assumption 3.1. M → ∞ as N → ∞.
12 Recall that we define the geodesic distance between two connected dyads m and m to be the smallest number of adjacent dyads between them.
13 This assumption is similar in spirit to Assumption 2.3 of Tabord-Meehan (2019) in which the minimum degree is assumed to grow at some constant rate relative to the number of individuals. It is milder than Assumption 2.3 of Tabord-Meehan (2019) since the latter does not allow any individual to be isolated, while Assumption 3.1 merely constrains the average degree. Similarly, this assumption is weaker than the assumption that the maximum degree in a network is bounded even when N → ∞ (e.g., Penrose andYukich (2003) andde Paula et al. (2018)).

Asymptotic Properties ofβ
We make use of the following two regularity assumptions for the proof of consistency ofβ for β (Theorem 3.1) and to derive its asymptotic distribution (Theorem 3.2). 14 Assumption 3.2 (Conditional Finite Moment of ε m ). There exists p > 4 such that as M → ∞, where p > 4 is the same as the in Assumption 3.2.
Assumption 3.2 requires that the errors are not too large once conditioned on common shocks. Together with the standard full rank assumption that guarantees identification of β 15 , this implies Assumption 3.1 of Kojevnikov et al. (2021) 3 is a condition that controls the tradeoff between the denseness of the underlying network and the covariability of the random vectors. If the network becomes dense, then the dependence of the associated random variables has to decay much faster. This requirement is consistent with the typical assumption in the spatial econometrics literature upon which the empirical analyses of the aforementioned examples (i.e., international trade, legislative voting, and exchange rates) are based, because it embodies the idea that spillovers decay as they propagate farther (see, e.g., Kelejian and Prucha (2010)). For instance, Acemoglu et al. (2015) assume that network spillovers are zero if agents are sufficiently distantly connected on a geographical network. This decay in the propagation of spillovers through the terms θ M,· . This assumption may be violated for very dense networks with low decay of spillovers.

Consistency
14 These assumptions are required for Theorem 3.2, but as usual, the proof of consistency (Theorem 3.1)can be derived under weaker conditions. (See Assumptions B.2 and B.3 and their associated discussion, in Appendix).
Theorem 3.1 establishes the consistency ofβ under the scenario where the number of sampling units N goes to infinity. When Assumption 3.1 is dropped, the result continues to hold in terms of the number of active dyads M .

Asymptotic Normality
is not a sum of independent variables, its variance cannot be simply expressed as a sum of the variances of Y M,m . We thus need to explicitly take into account covariance between the random variables {Y u M,m } m∈M N . We study the CLT for the normalized sum of Y u M,m , which is given by Assumptions 3.2 and 3.3 are written in terms of conditional expectations, whereas we are interested in the unconditional distribution of S M τ M . Assumption 3.4 below bridges the conditional and unconditional variances. The final assumption for the asymptotic normality result is a standard regularity condition guaranteeing that the asymptotic variance is welldefined. This assumption follows from both matrices in the expression being well-defined (e.g., x M,k having bounded support). 16 Assumption 3.4 (Growth Rates of Variances).
Under these assumptions, the asymptotic distribution ofβ is given by: which is positive semidefinite with finite elements.
16 Further note that Theorem 3.2 is proved under a weaker condition than Assumption 3.4.

Consistent Estimation of the Asymptotic Variance ofβ under Network Spillovers
Our objective is to consistently estimate AV ar(β) defined in Theorem 3.2. As errors are mean zero, Y M,m is centered, i.e., E Y M,m = 0 for each m ∈ M N .

The Estimator
The proposed estimator is a type of kernel estimator. Let b M denote the bandwidth, or the lag truncation (its choice is described in Section 3.4.2 below) and ω : R → [−1, 1] a kernel function such that ω(0) = 1, ω(z) = 0 whenever |z| > 1, and ω(z) = ω(−z) for all z ∈ R.
The feasible variance estimator of interest is with

Choice of Lag Truncation, b M :
There are two approaches for the choice of the associated lag truncation parameter. First, the researcher may already know (or is willing to impose) the truncation, perhaps due to a theoretical/institutional motivation. For instance, Acemoglu et al. (2015) set the lag to two in a related problem. Then, the thought exercise is that this choice will adapt as M → ∞ according to the assumptions below. Alternatively, the researcher could use a data-driven choice. Assumption 3.6 (c) below suggests it should depend on both the sample size and the network topology, including the average degree of the dyadic network. One such selection rule is suggested in Kojevnikov et al. (2021) based on their proofs: b M = 2 log(M )/ log(max(average degree, 1.05)).

Consistency of the Proposed Estimator
The consistency of the variance estimator requires two sets of additional assumptions. The first set is Assumption 4.1 of Kojevnikov et al. (2021), but stated here in terms of the network over dyads.
Assumption 3.6 (Kojevnikov et al. (2021), Assumption 4.1). There exists p > 4 such that Assumption 3.6 (a) is a stronger counterpart to Assumption 3.2, as it requires that a higher-order (i.e., higher than fourth order) conditional moment be well-defined. Assumption (b) posits a tradeoff between the kernel function, the denseness of a network and the dependence coefficients. Specifically, the kernel function ω M is required to converge to one sufficiently fast. Kojevnikov et al. (2021) demonstrate primitive conditions under which this requirement is fulfilled (Proposition 4.2). Assumption (c) requires that the correlation coefficients decay much faster relative to the denseness of the network. This is satisfied in the suggested choice for b M above.
Another set of conditions restricts the denseness of the network, ruling out the situation where the network becomes progressively dense: most notably, the case where every single individual unit is directly linked to every other individual.
The following theorem is the main theoretical contribution of this paper. Proof. See Appendix B.6. Theorem 3.3 establishes the consistency of our proposed variance estimator accounting for network spillovers across dyads in the sense of the Frobenius norm.

When to Use the Proposed Estimator and the Role of Decaying Spillover Effects
It follows from Theorem 3.3 that the dyadic-robust variance estimator (11) is inconsistent for the true variance when the underlying network involves a non-negligible degree of faraway correlations, as suggested in the examples of the previous section. 17 Specifically, the following corollary states that the dyadic-robust variance estimator of Aronow et al. (2015) may not necessarily be consistent when it is naïvely applied to the network-regression model with non-zero correlations beyond direct neighbors.
Corollary 3.1 (Inconsistency of Dyadic-Robust Estimators with Network Spillovers). Suppose that the assumptions required in Theorem 3.3 hold. Assume, in addition, that Then, the dyadic-robust estimator (11) applied to the network-regression model (1) and (2) is inconsistent.
The added condition (17) in Corollary 3.1 pertains to both the network configuration of active dyads and the regression variables. It represents a setting where the spillovers from far-away neighbors are non-negligible even when N is large. For instance, (17) can hold even if there are not many neighbors, as long as the covariances between the error terms are sufficiently large. This builds on Erikson et al. (2014) that inference with dyadic data may be biased if one only partially accounts for such spillovers. On the other hand, if far-away neighbors in the network have only a negligible effect on the cross-sectional dependence, the dyadic-robust variance estimator remains a good approximation for the asymptotic variance of linear dyadic data models with network spillovers across dyads. These insights are investigated further in Section 4 using numerical simulations.
These observations extend to settings where the spillovers decay along the network (i.e., when the correlation along unobservables decreases with the geodesic distance among dyads). Indeed, our estimator already accounts for such decay through the indirect covariances in its expression (16). When such spillovers propagate and decay is not too high, then condition (17) is satisfied, as such spillovers are non-neglible. On the other hand, if they decay at a very high rate (in the limit, a 100% decay from adjacent to connected dyads), then our estimator will become very similar to the dyadic-robust variance estimator.
However, some researchers may be willing to tolerate some asymptotic bias to still implement the dyadic-robust estimator even with some asymptotic bias. In such cases, when should they prefer our proposed estimator? While a general answer is complex because the (2015) and Proposition 3.1 and 3.2 of Tabord-Meehan (2019) also provide consistent variance estimators for the dyadic dependence case without higher-order network spillovers. However, these results and ours do not subsume one another. Indeed, their estimators can accommodate flexible dependence within clusters of dyads that share common units, while we assume that the network of spillovers is observed even if there are only adjacent connections. bias depends on both the strength of indirect spillovers and on the network configuration, we provide an intuitive criterion which is verified using: (i) a simple, yet informative, analytical example where spillovers decay exponentially with distance along the network (shown below), and (ii) extensive Monte Carlo simulations with decay and different network configurations (shown in Section 4). They illustrate that researchers should opt for the network-robust estimator above when they have a low tolerance for bias, when there are persistent indirect spillovers and/or when the network is not too sparse. Let B > 0 denote the maximum tolerance for condition (17) that the researcher is willing to allow when using the dyadic-robust variance estimator (11). Then, a sufficient condition for the researcher to prefer the proposed estimator (16) over (11) is that the decay rate γ is higher than a thresholdγ, where Proof. See Appendix B.8.
When the tolerated bias is small (B → 0), dependence is large enough and does not decay too fast (large γ), or the network is more dense, our approach is preferable because it provides a consistent estimator even with non-negligible spillovers. Since the network is observed, S and the last term are estimable and can be used for such diagnostics. Appendix C.4 provides further discussions, while the next section presents for our baseline results and discusses simulation exercises.

Simulation Design
We compare three types of variance estimators across different specifications and network configurations. We use the Eicker-Huber-White estimator as a benchmark, 18 the dyadicrobust estimator of Tabord-Meehan (2019) as a comparison accounting for the dyadic nature of the data (when inappropriately used in the presence of network spillovers), and our proposed estimator which is robust to network spillovers across dyads.
We first generate networks on which random variables are assigned. We follow Canen et al. (2020) among others by employing two models of random graph formations. They are referred to as Specifications 1 and 2. Specification 1 uses the Barabási and Albert's (1999) model of preferential attachment, with the fixed number of edges ν ∈ {1, 2, 3} being established by each new node. 19 Specification 2 is based on the Erdös-Renyi random graph Rényi, 1959, 1960) with probability p = λ N for N denoting the number of nodes and λ ∈ {1, 2, 3} being a parameter that governs the probability relative to the node size. The summary statistics for the networks generated by Specification 1 and 2 are given in Appendix C.1. The maximum degree and the average degree increase monotonically as we increase the parameters in both specifications. The number of active edges (i.e., dyads) also increases with the sample size regardless of the specification. This reflects that each node tends to have more direct links as the network becomes denser. In our exercises, the number of indirectly linked dyads also increases with network denseness. However, this is due to our simulated networks being relatively sparse. In other settings, the number of indirect connections may decrease with network density.
For each of the randomly generated networks, the simulation data is generated from the following simple network-linear regression: with m := d(i, j) representing the dyad between agent i and j. The dyad-specific regressor x m is defined as x m := |z i − z j |, where both z i and z j are drawn independently from N (0, 1). The regression coefficient is fixed to β = 1 across specifications.
The dyad-specific error term ε m is constructed to exhibit non-zero correlation with ε m 18 It is used in Bliss and Russett (1998) and Mansfield et al. (2000), for instance. 19 In generating the Barabási-Albert random graphs, we follow Canen et al. (2020) by choosing the seed to be the Erdös-Renyi random graph with the number of nodes equal the smallest integer above 5 √ N , where N denotes the number of nodes. 20 To simplify notation, we drop the M subscript, making the triangular array structure implicit.
as long as dyads m and m are connected (i.e., in the network terminology, there exists a path in the simulated network), while the strength of the correlation is assumed to decay as they are more distant. To that end, we draw ε m := m γ m,m η m,m , where γ m,m equals γ s if the distance between m and m is s, and 0 otherwise, for γ ∈ [0, 1] 21 and s ∈ {1, . . . , S} with S being the maximum geodesic distance that the spillover propagates to. Each η m,m is drawn i.i.d. from N (0, 1). Hence, γ controls the strength of spillover effects, representing their decay rate. If γ = 1, then spillover effects are the same no matter how far the agents are apart, i.e., the spillover effects do not decay. If γ = 0, there are no spillover effects, so the dyadic-robust variance estimator should be consistent. The case of S = 2 corresponds to a situation where up to friends of friends may matter for spillovers. We consider three scenarios for each type of network. In the main text, we set S = 2 and γ = 0.8. The results for S = 2 with γ = 0.2 are given in Appendix C.4, and the ones for S = 1 with γ = 0.8 are in Appendix C.5. Throughout the experiments, we employ the mean-shifted (by one) rectangular kernel with the lag truncation equal to two for the sake of comparison. 22

Results
In Table 1 we present the coverage probability for β and the average length of the confidence interval across simulations. To do so, we compute the t-statistic using the OLS estimator for β and different variance estimators under a Normal distribution approximation. 23 The finite-sample properties of the three variance estimators are further illustrated in Figure 2 in Appendix C.2.
The results for the empirical coverage probabilities depend on two dimensions: the sample size (N ) and the denseness of the underlying network (parametrized by ν and λ). The coverage probability for each estimator improves with the sample size. However, when spillovers are high (γ = 0.8), only our proposed network-robust variance estimator has coverage close at 95% nominal level: S = 2, γ = 0.8.

Specification 1
Specification 2 The upper-half of the table displays the empirical coverage probability of the asymptotic confidence interval for β, and the lower-half showcases the average length of the estimated confidence intervals. As the sample size (N ) increases, the empirical coverage probability for our estimator accounting for network spillovers approaches 0.95, the correct nominal level. However, that is not the case for alternative estimators.
to 95%, consistent with Theorem 3.3. Meanwhile, in this set-up, both the Eicker-Huber-White and the dyadic-robust variance estimators perform poorly as the underlying network becomes denser, no matter which specification of the network is involved. For example, in Specification 1 with ν = 3 and the largest sample size (N = 5000), the confidence intervals based on the Eicker-Huber-White and the dyadic-robust variance estimators do not cover the true parameter 615 and 455 times out of 5000 simulations (12.3% and 9.1%), respectively. On the other hand, the network-robust variance estimator is designed to capture higher-order correlations and, thus, its coverage remains stable across network configurations. A similar conclusion is drawn from the average length of the confidence intervals. The confidence intervals for the Eicker-Huber-White and dyadic-robust variance estimators are typically shorter than those for our proposed estimator when γ is large and S = 2. This means the former undercovers the true parameter (in the presence of positive spillovers), confirming our theoretical results (Corollary 3.1). The difference is often around 10 -20% of the length of the network-robust estimator and it increases with even denser networks (see Table 6 in Appendix). However, as the magnitude of spillovers decreases (i.e. γ tends to zero), higher-order spillovers are less pronounced, so that the biases from using the Eicker-Huber-White and dyadic-robust variance estimators disappear. This is shown in Table 7 of Appendix C.4 for the case of S = 2 and γ = 0.2. When S = 1, the dyadic-robust variance estimator coincides with our proposed estimator (i.e., there are no spillovers from non-adjacent links, or spillovers fully decay immediately). This is shown in Table 8 of Appendix C.5.
Finally, Appendix C.6 shows that the results are robust to spillovers that can reach the most distantly connected neighbors (S = ∞) and to choosing the lag-truncation adaptively. Rather than assuming a given value, we set b M = 2 log(M )/ log(max(average degree, 1.05)) as suggested above. The results are qualitatively and quantitatively very similar: when the decay is not too high (i.e., spillovers propagate), the confidence interval associated with the dyadic-robust estimator will undercover the true parameter even in moderate sample sizes.

Empirical Illustration: Legislative Voting in the European Parliament
We now turn to an empirical application to demonstrate the performance of our variance estimator with real-world data. In doing so, we revisit the work of Harmon et al. (2019) on whether legislators who sit next to each other in Parliament tend to vote more alike on policy proposals.
They focus on the European Parliament, whose Members (MEPs) are voted in through elections in each European Union (EU) member country every five years. The Parliament convenes once or twice a month, in either Brussels or Strasbourg, to debate and vote on a series of proposals. Once elected to the European Parliament (EP), these MEPs are organized into European Political Groups (EPGs), which aggregate similar ideological members/parties across countries. As Harmon et al. (2019) describe, these EPGs function as parties for many of the traditional party-functions in other legislatures, including coordination on policy and policy votes. Most importantly, MEPs sit within their EPG groups in the chamber. However, within each EPG group, non-party leaders traditionally sit in alphabetical order by last name. See Figure 4 in Appendix D.1 for an example. Hence, to the extent that last names are exogenous (i.e., politicians are not selected or choose last names anticipating seating arrangements in the European Parliament), seating arrangements are used as a quasi-natural experiment.

Data
We adopt the same dataset used in Harmon et al. (2019), which collects the MEP-level data on votes cast in the EP. The dataset records what each MEP voted for (Yes or No), where she was seated, and a number of individual characteristics (e.g. country, age, education, gender, tenure, etc). Its sample period is the plenary sessions held in both Brussels and Strasbourg between October 2006 and November 2010. This corresponds to the sixth and seventh terms of the European Parliament.
For our empirical illustration, we restrict the sample to the policies voted in Strasbourg during the seventh term and we focus on the seating pattern between July 14th -July 16th, 2009 (which involved 116 different proposals being voted on). The resulting sample has 2,431,261 observations, which are split into 422 politicians forming 26,099 pairs (i.e., dyads) of MEPs over 116 proposals. 24 Further information on the construction of our sample is detailed in Appendix D.3. These restrictions keep the main set-up in Harmon et al. (2019), while allowing us to evaluate variance estimators with smaller sample sizes.

Empirical Set-up
Let Agree d(i,j),t be an indicator that takes one if MEP i and j cast the same vote on proposal t, and zero otherwise. Likewise, SeatN eighbors d(i,j),t is a binary variable that equals one if MEP i and j are seated next to each other when the vote for proposal t is taken place, and zero otherwise. We view the seating arrangement as a network among the MEPs: i.e., two MEPs who are seated next to each other within the same political group are treated as an active dyad, which in turn constitutes a network over pairs of MEPs within each political group (see the note below Figure 4 for details). 25 We follow Harmon et al. (2019) in assuming that such seating arrangements are exogenously determined, i.e., the adjacency relation is exogenously formed. Their main specifica-tion is a linear model: The authors originally conducted inference using the estimator in Aronow et al. (2015), assuming that dyads m = d(i, j) cannot be correlated with m = d(k, l) unless they share a common member (i.e., there is no correlation across errors if i, j and k, l do not include a common unit). We will now compare this approach to using the variance estimator introduced in Section 3.4, which allows the error terms to exhibit non-zero correlations as long as they are connected on the network over dyads represented by the adjacency relation of seating arrangements in Parliament. In this analysis, we use the mean-shifted rectangular kernel with the lag truncation equal the longest path in the constructed network. This combination of the kernel density and the lag truncation enables us to accommodate all the possible correlations across connected dyads (i.e., pairs of MEPs), placing equal weight on each of them. This makes the comparisons across estimators more transparent. 26 Inspired by Harmon et al. (2019), we consider three specifications: (I) a simple linear regression model as given in (18)

Results
The main results of our empirical analysis are summarized in Table 2 Table 4), as they are close to 0.006 (their original results) and stable across specifications. 28 Hence, changes to point-estimates are not due to sample selection. The positive coefficient for SeatNeighbors suggests that the MEPs sitting together tend to vote more similarly than those sitting apart, providing evidence in favor of their original hypothesis. The coefficients on the covariates (displayed in Panel C of Appendix Table 13) are also quantitatively and 26 In Appendix D.4, we replicate this analysis with a different choice of kernel and setting the lag-truncation parameter following the criterion suggested above/in Kojevnikov et al. (2021). The results are very similar.
27 Following Harmon et al. (2019), we include indicators whether country of origins, quality of education, freshman status and gender, respectively, are the same, as well as differences in ages and tenures. See the note below Table 2 for details.
28 Note that our dependent variable is equal to one if two MEPs vote the same and zero otherwise, while Harmon et al. (2019) code it as one if MEPs vote differently. Hence, to compare our estimates with theirs, the signs on the estimates of SeatN eighbors must be flipped. qualitatively similar to those in their original paper. For instance, our estimates for Same-Country are 0.056, while their estimates are around 0.051, suggesting politicians from the same country are more likely to vote similarly on policies.
Panel B shows the standard errors for the regression coefficient of SeatNeighbors using different variance estimators. Building on Section 4, the panel compares three different types of heteroskedasticity robust estimators: namely, the Eicker-Huber-White, the dyadic-robust estimator (used in their original work), and our proposed network-robust estimator. We note that, due to the smaller sample size, the standard errors in our exercise are larger than the original authors'. Hence, the results in this section are not directly comparable to those in Harmon et al. (2019). Rather, we compare the different estimators within our own sample.
As foreshadowed in the Monte Carlo simulations, the Eicker-Huber-White estimates are the smallest, followed by the dyadic-robust estimates, which, in turn, are smaller than the network-robust estimates. In fact, for Specification (III), the Eicker-Huber-White estimate is roughly 73% smaller than using the estimator accounting for network spillovers across dyads, while the dyadic-robust one is 22% smaller. This fact entails two implications. First, our finding provides empirical evidence in support of the existence of indirect positive spillovers among the MEPs: even distant connections may indirectly generate correlated behavior among politicians i and j. Second, the use of alternative estimators not accounting for such spillovers undercovers the true parameter and may generate biased hypothesis testing about the regression coefficient of SeatNeighbors. The difference in estimates appears quantitatively meaningful in this empirical example.

Conclusion
With dyadic data, researchers typically assume that dyads are uncorrelated if they do not share a common unit, an assumption that is leveraged in inference. We showed that this assumption may be inappropriate in many models where interactions occur on a network: while data may be dyadic, the cross-sectional dependence may be much more complex and spillover beyond pairwise interactions. For instance, trade between countries may depend on trade between auxiliary partners and their unobservables. In political economy, whether a politician votes with a colleague may have spillovers from seating neighbors beyond one's own immediate ones. We verified this using both theoretical results and Monte Carlo simulations. We provided a new consistent variance estimator for parameters in a linear model with dyadic data with correct asymptotic coverage and good finite sample properties.
To conclude, we clarify that our goal in this exercise is neither to criticize dyadic-robust variance estimators, which are a fundamental part of the empiricist's toolkit, nor to suggest our approach should always be used. Rather, we wish to draw attention that researchers should fully specify the cross-sectional dependence in their model. If the conventional assumption of dyadic dependence correctly specifies the environment in question, or when spillovers beyond immediate neighbors might be negligible, then previous approaches suffice. However, as we have discussed above, existing applications may apply the latter method even if it is seemingly inappropriate to their setting. This includes situations where such network spillovers may be present or persistent (even with decay). In such scenarios, our estimator provides a possible solution. Those choices, though, must be guided by the application that empiricists face. Hence, building on Poast (2016), we recommend researchers to continue to fully specify their model, including full specification of their covariance structure, thereby clarifying what type of inference procedure is most appropriate for their environment.
with ρ M (m, m ) denoting the geodesic distance between dyads m and m , i.e., the smallest number of adjacent dyads between dyads m and m . In words, the set P M (a, b; s) collects all two distinct sets of active dyads whose sizes are a and b and that have no dyads in common. Next we consider a collection of bounded Lipschitz functions. Define with · ∞ representing the supremum norm and Lip(f ) being the Lipschitz constant. 29 In words, the set L K,c collects all the bounded Lipschitz functions on R K×c and the set L K moreover gathers such sets with respect to c ∈ N.
29 It is immediate to see that R is a normed space with respect to the Euclidean norm, while the R K×c Intuitively, this definition states that the upper bound must be decomposed into two components. The first part ψ a,b (f, g) is deterministic and depends on nonlinear Lipschitz functions f and g. The other component θ M,s is stochastic and depends only on the distance of the random variables on the underlying network. The former, nonrandom component reflects the scaling of the random variables as well as that of the Lipschitz transformations, while the latter random part stands for the covariability of the two random variables. We call θ M,s the dependence coefficient. We follow Kojevnikov et al. (2021) in assuming boundedness for these two components.
Assumption A.1 is maintained throughout the paper and employed to show asymptotic properties of our estimators such as the consistency and asymptotic normality, and the consistency of the network-robust variance estimator for dyadic data.

A.1 Related Literature
Our main insight in accommodating indirect spillovers is that we can rewrite the correlation structure among dyads as a dyadic network, where links denote whether they share a common member. As a result, this dyadic network describes how close/far certain dyads are from sharing members with other dyads. In doing so, the transformed problem is amenable to appropriate applications of recent developments in the statistics of random variables which are correlated along an (observable, exogenous) network. In particular, we apply asymptotic results for network-dependent random variables developed by Kojevnikov et al. (2021) 30 to an appropriately defined dyadic network, with assumptions imposed on the latter. Leung (2021) and Leung (2022) also apply the framework of Kojevnikov et al. (2021) to study, respectively, cluster-robust inference and causal inference for the case of individual-specific random variables. These papers focus on the correlation along a network over individuals, rather than over dyads. Meanwhile, Leung and Moon (2021) derive an asymptotic theory for dyadic variables in the context of networks, primarily for endogenous network formation models. The second term on the right hand side is equal to 0, due to Assumption B.1 (d). Next, Assumption B.1 (c) ensures existence of the inverse of the expectation term in the first term of the right hand side, ensuring identification.

B.2 Consistency ofβ
As usual, the Central Limit Theorem for a normalized sum requires us to have stronger conditions than what is required for consistency. Those stronger conditions were introduced in the main text as Assumptions 3.2 and 3.3. However, they are only required for Theorem 3.2. For the consistency proof (Theorem 3.1),we can replace those two assumptions by the following weaker conditions.
Assumption B.2 allows for the same interpretation as Assumption 3.2, i.e., the random error term ε m cannot be too large, conditional on a common component. This assumption, however, is less stringent than the previous one because it now requires the finiteness of a lower moment of ε m . Similar to Assumption 3.3, this assumption binds the covariance of the random variables, the dependence reflected in the dependence coefficients, and the underlying network. That is, given σ M growing at least at the same rate of M , the composite of the density of the network and the magnitude of the correlations of the random variables must decay fast enough.
Proof of Theorem 3.1: From (8), (12) and (13), we can writê where the last implication is a consequence of the Dominated Convergence Theorem. In view of Assumption 3.1, this is true also with respect to N going to infinity.
Since it holds by the Markov inequality that for any c > 0 it then follows that Finally, we can invoke the Cramér-Wold device to obtain as desired.

B.3 Lemma
Here we establish a lemma that is used repeatedly throughout the subsequent proofs in this paper. (ii) Suppose, moreover, that Assumption 3.7 holds. Then, Proof. (i) The fact that it is positive definite follows from Assumption 3.5. The fact that the elements are finite is proved by considering element-by-element convergence. Let x k,i denote the i-th element of x M,k . Then the (i, j) entry of 1 M k∈M E x M,k x M,k is given by: 1 M k∈M E x k,i x k,j . We write the (i, j) entry of A as A i,j . From Assumption 3.5, there exists a nonnegative finite constant C 0,1 such that Hence A i,j exists with being finite. By repeating the same argument for all i, j = 1, . . . , K, it holds that A exists with finite elements. (ii) To begin with, observe that Note that convergence of the second term follows from (i). Hence, we wish to prove that: To do so, we follow a strategy employed in Aronow et al. (2015) and Tabord-Meehan (2019). In light of (i), it remains to show As in (i), we consider the element-by-element convergence, using the same notation. The variance can be expressed as a sum of covariances: Again from Assumption 3.5, there exists a nonnegative finite constant C 0,2 such that Hence, where the last implication is due to Assumption 3.7. By repeating the same argument for all i, j = 1, . . . , K, we obtain Now, by the Chebyshev's inequality, we arrive at Furthermore, applying the Continuous Mapping Theorem yields obtaining the result. Therefore, as desired.

B.4 Asymptotic Normality ofβ
In this subsection, we prove Theorem 3.2 under a slightly milder condition than Assumption 3.4.

Assumption B.4 (Growth Rates of Variances).
There exists a sequence of (possibly random) When π N,M = 1, this assumption simplifies to Assumption 3.4, which is used for the results in the main text.
For our proof of the asymptotic distribution ofβ, we require that its asymptotic variance is well-defined. The first assumption, Assumption 3.5(a)-(b), is necessary for one of the matrices in the expression to be well-defined. 31 Part (c) assures that the middle part of the asymptotic variance is non-trivial.
When Assumption 3.4 is replaced by Assumption B.4, Assumption 3.5 must also be modified accordingly.  (15) simplifies to the assumption that appears in Lemma 1 of Aronow et al. (2015).
Proof of Theorem 3.2: From (13), where Φ(·) is the CDF of a standard Normal distribution. Then, by the law of total probability, we have where F X (·) denotes the probability distribution function of X. Now pick arbitrarily > 0. Then there exists M 0 > 0 such that for each M > M 0 where the first and second inequalities come from (21) and (20), respectively. Since the right hand side of (22) does not depend on t, we then have that for each M > M 0 , We have then shown that Next this can be combined with Assumption B.4 by using the Slutsky's Theorem, yielding that Moreover, applying the Cramér-Wold device gives where I K is the K × K identity matrix and τ M is understood as the variance-covariance matrix. 32 Now notice that we have .
Hence we obtain which is well-defined due to Lemma B.1 (i) along with Assumption B.5. When π N,M = 1, this is the result in the main text.

B.5 Lemma
In the proof of Theorem 3.3, we make use of the following lemma from Kojevnikov et al. Notice that since E Y M,m | C M = 0 a.s., it follows from the law of total variance that and N M is bounded due to Assumption 3.1, it thus suffices to show that Part (i) is already shown in Lemma B.1 (ii). Hence, it remains to prove Part (ii).
To begin with, observe that by the technique of add and subtract as well as the triangular inequality, We thus aim to prove We start with: (1) V c N,M − V N,M F p → 0: The proof proceeds in multiple steps: We begin with: 2 F → 0: We prove this by showing the element-wise convergence. With a slight abuse of notation, we denote the (a, b) entry of V c N,M and V N,M as V c a,b and V a,b , respectively. Then it is enough to verify that Notice that V c a,b and V a,b are given by where Y m,a and Y m,b stand for the a-th and b-th element of Y M,m , respectively.
Observe that By the Cauchy-Schwartz inequality, it then follows from from Assumption 3.6 (a) that there exists an a.s.-bounded functionC 1 such that E ε m ε j | C M ≤C 1 a.s. Similarly, we have an a.s.-bounded functionC 2 such that E ε k ε l | C M ≤C 2 a.s. Then, where x m,a represents the a-th element of x M,m and x j,b the b-th element of x M,j . Analogously, one obtains E Y k,a Y l,b | C M ≤ x k,a x l,bC2 a.s. Once again, through the multiple application of the Cauchy-Schwartz inequality, it follows that We note here that Assumption 3.5 ensures that there exists a nonnegative finite constant C m,a such that E x 8 l,a < C m,a , with the same argument holding true for x j,b , x k,a and x l,b as well. Hence, whereC is a nonnegative finite constant that is appropriately defined.
Substituting this into the inequality above, where the second inequality comes from Lemma B.2, and the last implication is due to Assumption 3.7. Therefore we have shown that By repeating the same argument for each a, b = 1, . . . , K, it follows that By the Chebyshev's inequality and the result of part (a), we complete part (1) as it follows that for any η > 0, for any η > 0.

First, we have 34
Observe that, by definition,ε m can be written asε m = ε m − x m (β − β). Hencê so that by the triangular inequality, for each m, j ∈ M N . Hence V N,M −Ṽ N,M F can be bounded as Now, since by Theorem 3.1, β − β 2 p → 0, and the application of the Continuous Mapping Theorem yields β − β 2 2 p → 0, it thus suffices to prove that each of R N,1 , R N,2 and R N,3 converges in probability to a finite number. In proving this, we follow a strategy employed in Aronow et al. (2015) and Tabord-Meehan (2019).
First let us study the expectation of R N,1 . By applying the Cauchy-Schwartz inequality repeatedly, we have that Here, in light of Assumption 3.6, there exists an a.s.-bounded function C 1 such that C 1 = sup N ≥1 max m∈M N E |ε m | 2 | C M , and moreover by Assumption 3.5, there exists a nonnegative finite number C 2 > 0 such that With a slight abuse of notation, we have for every N > 0 for some constant C ∈ (0, ∞), where the last inequality is because of Assumption 3.7.
Next let us study the variance of R N,1 . It suffices to show that E R 2 N,1 → 0, By the Cauchy-Schwartz inequality, it holds that Here, by Assumption 3.5 and the Cauchy-Schwartz inequality, there exists a nonnegative finite constant C 3 > 0 such that C 3 = sup N ≤1 max m,j,k,l∈M N E x m 2 2 x j 4 2 x k 2 2 x l 4 2 . Then, with a slight abuse of notation in writing C 1 2 3 as C 3 , we have Corollary A.2 of Kojevnikov et al. (2021) shows that there exists a nonnegative finite M,s , whereθ := sup M ≥1 max s≥1 θ M,s . Upon applying Lemma B.2 from the Appendix, we obtain where we apply Assumption 3.7 for the last implication, and C 4 and C 4 are nonnegative finite constants defined appropriately. Hence we have shown that R N,1 converges to a finite constant.
The proof of R N,2 is analogous.
It remains to show that R N,3 converges in probability to a finite constant. Let us first study the expectation of R N,3 . Observe that By Assumption 3.5, there exists a nonnegative finite number C 5 > 0 such that where we apply Assumption 3.7 in the last implication and a constant C ∈ (0, ∞) is appropriately defined.
Next let us consider the variance of R M,3 : Once again, Assumption 3.5 and the Cauchy-Schwartz inequality imply that there exists a nonnegative finite number C 6 > 0 such that where the last implication is a consequence of Assumption 3.7 (ii).
Therefore we have shown that To sum up, combining parts (1), (2) and (3), we have which completes the proof. B.7 Corollary 3.1 Proof of Corollary 3.1: For simplicity we denotê where we choose the kernel function and the lag truncation parameter so that the weights become equal one for all active dyads: namely, we use the mean-shifted rectangular kernel with the lag truncation being the length of the longest path in the network. Define moreover V ar(β) to be the same variance as in the main text - but now applied to the network-regression model (1) We prove that the inside the Frobenius norm does not converge in probability to zero. First it can immediately be shown, by Lemma B.1 (ii), that the "bread" part 1 Next plugging the definition ofε into the middle part, we have From Theorem 3.1, it can be seen that Q M,2 , Q M,3 and Q M,4 either converge to zero or diverge as N goes to infinity. When it comes to Q M,1 , observe that which never equals to zero due to the hypothesis (17) of this corollary. In either case, the middle part does not converge in probability to zero, meaning that NV N etwork → 0 is not true. This, however, contradicts the implication of the assumption that the dyadicrobust variance estimator is consistent. Hence, by means of contradiction, we conclude that the dyadic-robust variance estimator is not consistent, which completes the proof.

B.8 Example 3.1
Proof of Example 3.1 By the inequality of arithmetic and geometric means, the left-hand side of (17) can be bounded by , where S ≥ 2 denotes the length of the longest path in the network. As the first and third terms in both estimators are the same, then using the proposed network-robust estimator will be desirable if the middle term (i.e., the left-hand side above) is larger than the tolerated threshold, B: Passing logs on both sides yields the results. The lower bound is attained if γ s δ ∂ M (s) = γ s δ ∂ M (s ) for all s, s = 2, . . . , S. Note that S and the network densities {δ ∂ M (s)} s≥2 can be estimated following the definition (14), as a (sample) network is observable. Table 3 shows summary statistics (i.e., the average and maximum degrees) of the networks across nodes that are used in our simulation study. By construction, the maximum degree and the average degree monotonically increase in the parameters for both specifications.  Table 4 reports the degree characteristics of the networks when viewed as networks over the active edges. The table provides the average degree, the maximum degree, and the number of active edges (i.e., dyads). C.2 S = 2 and γ = 0.8

C.1 Summary Statistics
In this section, we further discuss the results of the Monte Carlo simulations presented in the main text. The asymptotic behaviors of the three variance estimators are illustrated in Figure 2, where the horizontal axes represent the sample size and the vertical axes indicate the standard error of the regression coefficient. The boxplots show the 25th and 75th percentiles across simulations, as well as the median, with the whiskers indicating the bounds that are not considered as outliers. The whisker length is set to cover ±2.7 times the standard deviation of the standard-error estimates. The light-, medium-and dark-gray boxplots describe the distribution of the Eicker-Huber-White, the dyadic-robust and our proposed network-robust variance estimates across simulations, respectively. The diamonds indicate the empirical standard errors of the estimates of the regression coefficients, what Aronow et al. (2015) call the true standard error. It is unsurprising that the empirical standard errors are the same across different variance estimators, as we use the sameβ. The boxplots show that as the sample size increases, the variation of the network-robust variance estimator shrinks, reaching the empirical standard error (the diamonds). This is as expected since this estimator is consistent for the true variance (Theorem 3.3). The estimates appear to vary little for moderate sample sizes (e.g., N = 1000). However, the other variance estimators (the lightand medium-gray boxplots) converge to lower values than the empirical standard errors (the diamonds), verifying their inconsistency in this environment with network spillovers, as shown by Corollary 3.1. As we make such spillovers very small (e.g., γ = 0.2 in Appendix C.4), all estimators have similar performance. This highlights the role of condition (17): namely, the dyadic-robust variance estimator might perform satisfactorily well as long as higher-order correlations beyond immediate neighbors are negligible. The light-gray box illustrates the Eicker-Huber-White standard error, the medium-gray one the dyadic-robust standard error and the dark-gray one the network-robust standard error. The diamonds stand for the empirical standard error, defined as the standard deviation of the estimates of the regression coefficient. The estimator is considered as not covering the true standard error when the diamond is outside of the shaded area.  Aronow et al. (2015) calls the true standard errors) and the means of the estimated standard errors for each variance estimator. The round brackets indicate the biases of each estimate relative to the true standard error in percentage (%). For instance, the Eicker-Huber-White variance estimator and the dyadic-robust variance estimator, when applied to Specification 1 with ν = 3, underestimate the true standard error by 21.45% and 14.14%, respectively. C.3 S = 2 and γ = 0.8 with Higher Density Parameters This subsection examines how an increase in the number of connected dyads affects the performance of the dyadic-robust variance estimator. Table 6 reports the results for the case of S = 2 with γ = 0.8, i.e., the same combination as the main text (Table 1), but for denser networks which set ν = 4, 5 for Specification 1 and λ = 4, 5 for Specification 2. We find that, while our estimator performs well (with coverage close to the nominal level), the bias in the Eicker-Huber-White and dyadic-robust estimators variance estimators are present and increase as the network becomes denser.
C.4 S = 2 and γ = 0.2 Table 7 presents the empirical coverage probability and average length of confidence intervals for β at 5% nominal size when S = 2 and γ = 0.2. The associated boxplots are given in Figure  3. Since the magnitude of spillovers is now much smaller than the case of γ = 0.8, there are only minor differences in performance between the network-robust variance estimator and the other two existing methods (namely, the Eicker-Huber-White and dyadic-robust variance estimators). In terms of convergence, the comparable performance of the dyadicrobust-variance estimator is evident in Figure 3. Comparing Table 7 to Table 1 highlights the impact of spillovers on the variance estimators. When the spillovers are substantially weak (e.g., γ = 0.2), the dyadic-robust variance estimator can serve as a good substitute for the network-robust one. In the case of relatively high spillovers (e.g., γ = 0.8), on the other hand, there are evident biases (around 4 percentage points for Specification 1 and 3 percentage points for Specification 2 when N = 5000). Based on this comparison, we suggest that the network-robust variance estimator be used when the correlations are expected to be relatively strong.

C.5 S = 1
For comparison purposes, this subsection explores the results for S = 1. If S = 1, there are no higher-order correlations beyond direct (adjacent) neighbors. Then, the network-robust variance estimator ought to coincide with the dyadic-robust variance estimator by definition, for any γ, as pointed out in Example 2.1. This is verified below for the case of γ = 0.8. Table  8 shows the simulation results.

C.6 S = ∞ with the Parzen kernel
In this subsection, we investigate the consequences of adaptively choosing the value of the lag-truncation parameter following the rule outlined in the main text. To this end, we set S = ∞ (i.e., spillovers may propagate to all neighbors), with the magnitude of the spillovers controlled by γ = 0.8 (the same as in the main text). In this environment, the spillovers are never truncated while decaying as they propagate farther. With regards to estimation, we consider the Parzen kernel, letting the lag-truncation parameter be chosen on the basis of Kojevnikov et al. (2021). The simulation results are given in Table 9, while the selected lag-truncation parameters are shown in Table 10.
The empirical coverage probability based on the network-robust variance estimator approaches to 95%, as expected. On the other hand, both the Eicker-Huber-White and dyadic- Table 7: The empirical coverage probability and average length of confidence intervals for β at 95% nominal level: S = 2, γ = 0.2.

Specification 1
Specification 2 N ν = 1 ν = 2 ν = 3 λ = 1 λ = 2 λ = 3 Note: This figure shows boxplots describing the estimated standard errors and the empirical standard errors for various combinations of parameters under Specification 1 (Barabási-Albert networks) and Specification 2 (Erdös-Renyi networks). The horizontal axis shows the number of nodes and the vertical axis represents the the standard error of the coefficient. The shaded boxes represent the 25th, 50th and 75th percentiles of estimated standard errors with the whiskers indicating the most extreme values that are not considered as outliers.
The light-gray box illustrates the Eicker-Huber-White standard error, the medium-gray one the dyadic-robust standard error and the dark-gray one the network-robust standard error. The diamonds stand for the empirical standard error, defined as the standard deviation of the estimates of the regression coefficient. This figure showcases the boxplots for the case when γ = 0.2. Table 8: The empirical coverage probability and average length of confidence intervals for β at 95% nominal level: S = 1, γ = 0.8.

Specification 1 Specification 2
N ν = 1 ν = 2 ν = 3 λ = 1 λ = 2 λ = 3 robust variance estimator understate the targeted nominal level, as claimed in the main text. It should be noted that these biases can become larger when the decay rate is slower. We focus on Specification 2, as it likely satisfies the assumptions above under S = ∞. After all, with S = ∞ and a very dense network, Assumption 3.4 is violated. Table 9: The empirical coverage probability and average length of confidence intervals for β at 95% nominal level, Specification 2: S = ∞, γ = 0.8, the Parzen kernel.
N λ = 1 λ = 2 λ = 3 Note: This table displays the lag-truncation parameters b M for the simulations in Table 9, selected using the rule: b M = 2 log(M )/ log(max(average degree, 1.05)), with M denoting the number of active dyads.
D Additional Information for the Empirical Illustration D.1 Seating Arrangement at the European Parliament  Table 11 lists the summary statistics of the seating arrangement (for Strasbourg at term 7) when viewed as a network over pairs of MEPs. Its summary statistics are consistent with those from the Erdös-Renyi random network with λ = 1 to λ = 3 (see Table 4). This suggests that our empirical illustration should perform well with the Parzen kernel and bandwidth choice proposed in Kojevnikov et al. (2021).  Table 4 for the definition of the first three indicators. The last two represent the number of adjacent and connected dyads, respectively.

D.3 Data Construction
Data construction for our empirical exercise in Section 5 proceeds in multiple steps: Step 1: Our subsample consists of the location of interest (i.e., Strasbourg) for the period of interest (i.e., Term 7). We select a further subset of the extracted data by seating Alafano and Alvaro are treated as adjacent because they are sitting next to each other and belong to the same political party, i.e., ALE. Similarly, Ojuland and Oviir are considered to be adjacent. On the other hand, following the original authors, Alvaro and Cavada are not regarded as adjacent though they are seated together because they belong to different political parties, i.e., ALE and PPE, respectively. In terms of dyad-level adjacency, Cavada-Coelho and Coelho-Collino are adjacent dyads as they share Coelho, whereas Cavada-Coelho and Collino-Comi are not adjacent, but they are still connected as they have indirect paths to one another along the dyadic network. arrangement (i.e., we focus on Pattern 1 for the present analysis -see Table 12).
Step 2: Since our analysis is concerned with voting concordance, we follow the original authors in dropping entries with missing data or "abstain" in the variable "vote." 36 Step 3: The resulting data still contains individuals belonging to "Identity, Tradition and Sovereignty (ITS)," one of the European Political Groups that dissolved in November 7, during the sixth term. We drop such MEPs from our analysis.
Step 4: The selected data is used to form the dyadic data registering the pair-of-MEPsspecific information. When pairing two MEPs, we follow Harmon et al. (2019) in focusing on those pairs of MEPs, both of whom are (i) in the same EPG; (ii) from an alphabetically-seated EPG; and (iii) non-leaders at the time of voting.
Our dyadic data consists of two types of variables: binary variables and numerical variables. The dyad-level binary (i.e., indicator) variables are defined to be one if the individual-level binary variables are the same, and zero otherwise. The dyadic-specific numerical variables in our analysis are the differences between the individual-level numerical variables, such as age and tenure. When calculating the differences in ages and tenures, we take the absolute values as we do not consider directional dyads, and we then rescale them into ten-year units. See the note below Table 2 for details. Table 13 reports the detailed result of the empirical illustration. As explained in Section 5.3, Panel A reports the estimates of the parameter of interest, while Panel B lists the standard errors based on the different variance estimators. In particular, we carry out the estimation using both the network-robust variance estimator with the mean-shifted rectangular kernel, and the one with the Parzen kernel, generating the same estimates. Panel C collects the parameter estimates for other covariates accompanied by the standard errors obtained from our proposed variance estimator from equation (10), and indicates the presence or absence of day-level fixed effects.  Note: Panel A displays the parameter estimates for the three different specifications; Panel B shows the standard errors for the regression coefficient of SeatNeighbors using different variance estimators; and Panel C collects the parameter estimates for other covariates accompanied by the standard errors obtained from our proposed variance estimator from equation (10), and indicates the presence or absence of day-level fixed effects. Adjacency of MEPs is defined at the level of a row-by-EP-by-EPG. (See the note below Figure 4.) Independent variables are as follows: Seat neighbors is an indicator variable denoting whether both MEPs sit together; Same country represents an indicator for whether both MEPs are from the same country; Same quality education is an indicator showing whether both MEPs have the same quality of education background, measured by if both have the degree from top 500 universities; Same freshman status encodes whether both MEPs are freshman or not; Age difference is the difference in the MEPs' ages; and Tenure difference measures the difference in the MEPs' tenures.