## 1 Motivation

The correct specification of the inherently unknown spatial process generating observable patterns of interrelatedness among the units of analysis constitutes a considerable challenge in cross-sectional studies. In particular, distinguishing substantively meaningful indirect spillover effects from spatially correlated random shocks is imperative as there is a serious risk of making incorrect inferences when estimating a misspecified model (e.g., LeSage and Pace 2009; Darmofal 2015). Unfortunately, while unfocused tests for spatial autocorrelation commonly applied in empirical research, like Moran’s *I* (e.g., Cliff and Keith Ord Reference Cliff and Keith Ord1981), help to detect spatial clustering in model residuals, they do not provide guidance on the exact process generating these dependencies. Since spatial regression models differ with respect to the implied pathways of dependence, these simple diagnostic tools do not allow researchers to identify the adequate model specification.

To address this specification problem, many empirical model selection strategies proposed in the literature feature the estimation of the spatial Durbin model (SDM) as an unrestricted nesting model and utilize the Wald test to scrutinize nonlinear common factor restrictions implied by pure error dependence (e.g., Burridge Reference Burridge1981).Footnote ^{1} Since the Wald test is asymptotically equivalent to the likelihood ratio (LR) and the Lagrange multiplier (LM) tests, which are the two alternative likelihood-based specification tests, the choice of a test statistic is oftentimes motivated by convenience or familiarity (e.g., LeSage and Pace Reference LeSage and Pace2009, 55). However, while previous studies in the field of spatial econometrics report notable differences with respect to their finite sample properties (e.g., Mur and Angulo Reference Mur and Angulo2006; Mur and Angulo Reference Mur and Angulo2009), the Wald test’s sensitivity to algebraically equivalent alternative formulations of the null hypothesis is somewhat overlooked. Given that this result is well-established in a time-series context (e.g., Gregory and Veall Reference Gregory and Veall1985; Lafontaine and White Reference Lafontaine and White1986; Breusch and Schmidt Reference Breusch and Schmidt1988; Dagenais and Dufour Reference Dagenais and Dufour1991; de Paula and Cribari-Neto Reference de Paula Ferrari and Cribari-Neto1993; Goh and King Reference Goh and King1996) and regarding the importance of distinguishing spillover effects from residual correlation for substantive inferences, this negligence is startling.

By remedying this omission, the present study evaluates the Wald test’s appropriateness for differentiating between alternative mechanisms that cause spatial clustering in cross-sectional data structures. It discusses the substantive and econometric implications of alternative spatial processes and shows analytically that the Wald test’s lack of invariance to reparameterizations of nonlinear common factor restrictions stems from the necessity to approximate the restrictions’ sampling distributions. While asymptotically valid, Monte Carlo experiments demonstrate that this approximation frequently leads to misleading inferences concerning the presence of spillover effects across a wide range of parameter settings in finite samples. An empirical example further illustrates the severity of this problem for applied research aiming to assess the support for distinct theoretical mechanisms against possible alternative explanations. Given that a misspecification of the process generating cross-sectional dependencies can bias substantive inferences, the results suggest that, irrespective of the specification search strategy employed, researchers should not base inferences on the Wald statistic’s asymptotic $\chi ^{2}$ distribution. Instead, simulation techniques such as bootstrap methods allow researchers to use estimated critical values as an alternative to their asymptotic counterparts. The LR test also offers a valuable alternative procedure that is invariant to reparameterizations of the null hypothesis.

## 2 Substantive and Residual Dependence in Cross-Sectional Models

In regression analyses utilizing cross-sectional data, three different types of interaction effects can be distinguished that generate spatial autocorrelation in the dependent variable. First, endogenous interaction effects occur whenever the units’ outcomes are intertwined. In these situations, the actions, decisions, or behaviors of the units are simultaneously determined and responsive to the other units’ outcomes. Second, exogenous interaction effects cause spatial clustering by linking the response of each unit to the covariates of other units. Finally, cross-sectional dependencies can be a product of spatially correlated model residuals (e.g., Elhorst Reference Elhorst2014b; Halleck Vega and Elhorst Reference Halleck Vega and Elhorst2015). While endogenous and exogenous interaction effects are part of the regression’s systematic component, correlation among the error terms is confined to the model residuals and does not affect the expectation of the outcome conditional on the regressors.

Despite their close correspondence (see e.g., Gibbons and Overman Reference Gibbons and Overman2012), the distinction between the different mechanisms causing spatial dependencies in the data has far-reaching implications for the estimation and interpretation of the regression coefficients (LeSage and Pace Reference LeSage and Pace2009; Rüttenauer Reference Rüttenauer2019). Importantly, substantively meaningful indirect spillover effects, loosely defined as the impact of changes in one unit’s covariates on the other units’ outcomes, only exist if cross-unit interactions are part of the regression’s systematic component (Elhorst Reference Elhorst2010; Darmofal Reference Darmofal2015; Halleck Vega and Elhorst Reference Halleck Vega and Elhorst2015). In these instances, the cross-partial derivative of unit *i*’s outcome $y_{i}$ with respect to unit *j*’s covariate $x_{j}$ is nonzero, signifying a systematic relationship between the units (e.g., LeSage and Pace Reference LeSage and Pace2009, 38). Otherwise, the regression model imposes the restriction of no spillovers by assumption.Footnote ^{2} To detect the existence of these spillover effects, different model specification search strategies suggest to utilize the unrestricted SDM model as a general model featuring substantive as well as residual dependence and subsequently test several parameter restrictions (e.g., Mur and Angulo Reference Mur and Angulo2006; LeSage and Pace Reference LeSage and Pace2009; Elhorst Reference Elhorst2010).

### 2.1 An Illustrative Example of the Different Spatial Processes

Before outlining the alternative spatial model specifications, it is useful to contrast the different spatial processes with respect to their substantive implications for empirical political science research.

Spillover effects occur whenever the behavior (endogenous interactions) or certain characteristics (exogenous interactions) of one unit—may this be a country, a (coalition) government, a political party, or any other entity of interest—affect adjacent units (Darmofal Reference Darmofal2015, 5).Footnote ^{3} For example, the municipalities’ income tax revenues are directly related to their local economic performance. At the same time, the economic prosperity of its neighbors also exerts a positive impact on a municipality’s income tax revenues as its residents might commute to work. These dependencies in income tax revenues produce spillovers between the municipalities: the effect of a change in one unit’s characteristics propagates to its neighbors (e.g., LeSage and Pace Reference LeSage and Pace2009). Hence, adequately understanding the phenomenon of interest—cross-sectional variation in income tax revenues—necessitates the consideration of these exogenous interaction effects among the municipalities that generate substantively meaningful and theoretically relevant indirect spillover effects.

In contrast, cross-sectional dependencies in the disturbances constitute another spatial process that has different substantive implications and requires an alternative model specification. Several circumstances can cause correlation between the units’ residuals, for example spatial clustering in measurement errors. Alternatively, omitting a spatially dependent explanatory variable that is part of the true data-generating process (DGP) from the regression equation leads to correlated errors (Elhorst Reference Elhorst2014b; Darmofal Reference Darmofal2015). With regards to the example given above, it is reasonable to expect that several unobservable but spatially dependent characteristics, like a region’s general appeal as a place of residence, affect a municipality’s revenue from income taxation as well. In contrast to the theorized exogenous interaction effects, these omitted characteristics merely affect the model residuals and there are no relevant indirect effects present in the process that generates the data. Since exactly modeling the true dependence structure is almost impossible (e.g., Juhl Reference Juhl2020b), omitting relevant variables that are spatially correlated can create linkages among the units’ disturbances.

### 2.2 Common Factors in the Spatial Durbin Model

While the previous example illustrates the crucial differences in the substantive implications of the underlying process that links the observations to one another, model misspecification may cause severe econometric problems as well. In fact, neglecting cross-sectional interdependencies can induce correlation between the regressors and the residuals, resulting in the canonical endogeneity problem (e.g., Gibbons and Overman Reference Gibbons and Overman2012; Betz, Cook, and Hollenbach Reference Betz, Cook and Hollenbach2019).

In order to illustrate the problem of omitted spillover effects, consider the stylized DGP in which a dependent variable $\boldsymbol {y}$ is entirely determined by two uncorrelated regressors, denoted $\boldsymbol {x}$ and $\boldsymbol {z}$, such that $\boldsymbol {y}=\boldsymbol {x}\beta +\boldsymbol {z}$.Footnote ^{4} Assume that $\boldsymbol {z}$ is unobservable and follows a spatial autoregressive process such that $\boldsymbol {z}=\rho \boldsymbol {Wz}+\boldsymbol {u}_{1}$, where $\rho $ is a scalar parameter, $\boldsymbol {W}$ is an exogenously defined connectivity matrix, and $\boldsymbol {u}_{1}$ is a vector of independent and identically distributed normal disturbances with zero mean and a fixed variance.Footnote ^{5} This scenario leads to the spatial error model (SEM) specification:

Due to the uncorrelatedness of $\boldsymbol {x}$ and $\boldsymbol {z}$, omitting the spatially autocorrelated variable does not lead to endogeneity concerns. Furthermore, the relationship depicted in Equation (1) implies no indirect spillover effects since the cross-unit interactions are confined to the residuals. Consequently, even a nonspatial OLS model specification would provide unbiased but inefficient parameter estimates (Lacombe and LeSage Reference Lacombe and LeSage2015).

Now consider a slightly modified scenario in which the included and the omitted regressors are no longer independent from one another but correlated. To induce correlation between the variables, suppose that the random variable $\boldsymbol {u}_{1}$ in Equation (1) is replaced by $\boldsymbol {u}_{2}$ which is an additive linear function of $\boldsymbol {x}$ and a stochastic disturbance term $\boldsymbol {v}$ such that $\boldsymbol {u}_{2}=\boldsymbol {x}\gamma +\boldsymbol {v}$. In addition to the spatial autocorrelation, the unobserved covariate $\boldsymbol {z}$ is now also correlated with $\boldsymbol {x}$ and the scalar $\gamma \in (0,1]$ as well as the dispersion of $\boldsymbol {v}$ ($\sigma _{v}^{2}$) jointly determine the strength of the correlation. In this slightly modified scenario, the true DGP becomes

In contrast to the DGP shown in Equation (1), the correlation between the included regressor $\boldsymbol {x}$ and the spatially clustered variable $\boldsymbol {z}$ causes an endogeneity problem if $\boldsymbol {z}$ is omitted from the regression’s systematic part. Importantly, the effect of a change in regressor $\boldsymbol {x}$ on the outcome $\boldsymbol {y}$ in Equation (2) is more complex and both a nonspatial OLS model as well as a SEM model do not yield unbiased estimates since they ignore indirect effects produced by the spatial patterning of the correlated omitted variable. Consequently, the effect of a change in $x_{i}$ is not confined to unit *i*’s outcome $y_{i}$ but rather pertains to all nonisolated units in the entire system through indirect spillover and instantaneous feedback effects (e.g., LeSage and Pace Reference LeSage and Pace2009; Betz *et al*. Reference Betz, Cook and Hollenbach2019).

In order to address the endogeneity problem and to identify meaningful spillover effects, the unrestricted SDM model plays an important role. In fact, it serves as a general nesting model in many specification search strategies since it comprises several simpler spatial regression models frequently employed in empirical studies (e.g., Mur and Angulo Reference Mur and Angulo2009; Elhorst Reference Elhorst2010; Angulo and Mur Reference Angulo and Mur2011).Footnote ^{6} By allowing researchers to test different parameter restrictions, the SDM model facilitates the specification of an econometric model that reflects the spatial process generating the data most appropriately. For the hypothetical scenario with one regressor $\boldsymbol {x}$ and the (possibly correlated) unobserved variable $\boldsymbol {z}$ discussed here, the SDM model to be estimated takes the following form:

While it is easy to verify that the SDM model reduces to the popular spatial autoregressive (SAR) model, also known as spatial lag model (e.g., Elhorst Reference Elhorst2014b, 5), that features global spillover effects if $\theta = 0$ and to the spatial lag of X (SLX) model featuring local spillovers when $\rho = 0$, it also subsumes the SEM model that rules out any substantive indirect effects by assumption.Footnote ^{7} To illuminate the relationship between these models, it is useful to restate the SEM DGP displayed in Equation (1) by multiplying both sides of the equation by $(\boldsymbol {I}_{n}-\rho \boldsymbol {W})$ and rearranging terms which results in the following structural form (Burridge Reference Burridge1981; Anselin Reference Anselin2003):

Equation (4) elucidates that the SEM process imposes a nonlinear common factor restriction on the parameter associated with $\boldsymbol {Wx}$ which can be assessed by using estimates from the unrestricted SDM model depicted in Equation (3).Footnote ^{8} More precisely, if the estimates from the unrestricted SDM model satisfy the common factor restriction $\theta =-\rho \beta $, the model can be simplified to a SEM model specification because there are no discernible substantive spillover effects in the data. Given that $E(\hat {\beta })-\beta =0$ in Equation (4), the common factor restriction holds for SEM processes.

However, if $\boldsymbol {x}$ and $\boldsymbol {z}$ are correlated in the true DGP, an endogeneity problem occurs since the variable $\boldsymbol {z}$ is unobserved and the common factor restriction no longer holds. Restating the DGP in Equation (2) that features this correlation in a similar fashion yields:

Given the discussion above, Equation (5) illustrates that, in the presence of an omitted variable that is (i) spatially clustered and (ii) correlated with an included regressor, the true DGP resembles the SDM specification. Although the SDM model shown in Equation (3) provides consistent estimates for $E(\hat {\rho })=\rho $ and $E(\hat {\theta })=-\rho \beta $, the estimate of $\beta $ is asymptotically biased since $E(\hat {\beta }) = \beta + \gamma $. Based on these model estimates, the common factor restriction does not hold because of the endogeneity bias present in $E(\hat {\beta })$. Hence, a violation of the common factor restriction is indicative of the existence of indirect spillover effects that need to be included in the systematic part of the regression model.

As this discussion suggests, the spatially lagged exogenous variables in the SDM model specification can also be understood as instruments for omitted variables that are correlated with included regressors (e.g., Elhorst Reference Elhorst2014b, 18). At the same time, as Gibbons and Overman (Reference Gibbons and Overman2012) emphasize, this strategy provides only weak identification of the model parameters and crucially depends on the exogeneity and the assumed perfect knowledge of $\boldsymbol {W}$. Consequently, the SDM model does not provide a general solution to the omitted variables problem that would allow researchers to identify causal effects. Just like in any other observational study, doing so requires the application of appropriate research designs (see also Betz *et al.* Reference Betz, Cook and Hollenbach2019; Rüttenauer Reference Rüttenauer2019). Moreover, while the SDM specification features global spillover effects, local spillovers would require a different model specification like the spatial Durbin error model (e.g., Halleck Vega and Elhorst Reference Halleck Vega and Elhorst2015).

## 3 The Wald Test of Nonlinear Restrictions

Given the estimates from an unrestricted model, the Wald test is flexible enough to scrutinize several different and possibly nonlinear parameter constraints within the same model. Intuitively, the Wald test assesses the distance between the observed estimates and the restrictions imposed. As the distance grows, the restrictions become less likely. In contrast to the LM and the LR tests which constitute prominent and asymptotically equivalent alternative specification tests, the Wald test does not require the estimation of restricted alternative models (e.g., Burridge Reference Burridge1981).

Despite its advantages, a major drawback of the Wald test of nonlinear restrictions is its lack of invariance to algebraically equivalent expressions of the null hypothesis. Since the asymptotic distribution of a nonlinear restriction needs to be approximated by a Taylor series expansion, seemingly identical functional representations may produce different test statistics. Although this undesirable property is a well-known feature of the Wald test (e.g., Gregory and Veall Reference Gregory and Veall1985; Lafontaine and White Reference Lafontaine and White1986; Breusch and Schmidt Reference Breusch and Schmidt1988; Dagenais and Dufour Reference Dagenais and Dufour1991; de Paula and Cribari-Neto Reference de Paula Ferrari and Cribari-Neto1993; Goh and King Reference Goh and King1996), its consequences for empirical spatial model search strategies have been neglected so far.

### 3.1 Analytical Derivation and Asymptotic Distribution of the Wald Statistic

Consider a situation in which a test needs to be constructed in order to evaluate a single nonlinear restriction $H_{0}: g(\boldsymbol {\lambda }) = 0$, where $\boldsymbol {\lambda }$ is a parameter vector and $g(\cdot )$ is some function that is continuously differentiable in a neighborhood of $\boldsymbol {\lambda }$. For this general case, the Wald statistic is defined by

where $\widehat {V(g(\boldsymbol {\hat {\lambda }}))}$ is the estimated variance of $g(\boldsymbol {\hat {\lambda }})$. Under the null hypothesis, *w* asymptotically follows a $\chi ^{2}$ distribution with the number of degrees of freedom equal to the number of restrictions imposed.

The only complication involved here is that obtaining *w* necessitates knowledge about the sampling distribution of a nonlinear function. While it is straightforward to compute the value of $g(\boldsymbol {\hat {\lambda }})$ at the parameter estimates, deriving the Wald statistic in Equation (6) additionally requires information about its variability which depends on the estimator $\boldsymbol {\hat {\lambda }}$ and the restriction $g(\cdot )$. However, given the restriction’s nonlinearity, exact distributional results become inapplicable (Greene Reference Greene2012). Instead, the delta method provides an approximation of the restriction’s asymptotic distribution.Footnote ^{9} It is based on a first-order Taylor series expansion of $g(\boldsymbol {\hat {\lambda }})$ around the true parameter vector $\boldsymbol {\lambda }$. Assuming that $\boldsymbol {\hat {\lambda }}$ is a consistent estimator with a limiting distribution defined by ${\sqrt {n}(\boldsymbol {\hat {\lambda }} - \boldsymbol {\lambda }) \overset {d}{\rightarrow } \mathcal {N}(\boldsymbol {0}, \boldsymbol {\Sigma })}$ and that the standard regularity conditions hold (see e.g., Newey and McFadden Reference Newey, McFadden, Engle and McFadden1994), the delta method implies that

where $\boldsymbol {G}(\boldsymbol {\lambda }) = \partial g(\boldsymbol {\lambda })/ \partial {\boldsymbol {\lambda }'}$ is a row vector of partial derivatives. It follows that the asymptotic distribution of the restriction under $H_{0}$ is $g(\boldsymbol {\hat {\lambda }}) \overset {a}{\sim } \mathcal {N}\big (g(\boldsymbol {\lambda }), \boldsymbol {G}(\boldsymbol {\lambda })n^{-1}\boldsymbol {\Sigma }\boldsymbol {G}{(\boldsymbol {\lambda })'}\big )$. By using the consistent estimates obtained from an unrestricted model, the restriction’s sampling variability derived from the delta method is given by

with $\boldsymbol {G}(\boldsymbol {\hat {\lambda }})$ evaluated at $\boldsymbol {\hat {\lambda }}$ and $\widehat {\boldsymbol {\Sigma }} = n^{-1}\boldsymbol {\Sigma }$ being a consistent estimator of the symmetric, positive definite asymptotic variance–covariance matrix. By substituting Equation (8) into Equation (6), the Wald test statistic can be calculated. Asymptotically, since $\text {plim}_{n\to \infty }\boldsymbol {\hat {\lambda }}=\boldsymbol {\lambda }$, the function $g(\boldsymbol {\hat {\lambda }})$ converges in distribution to $g(\boldsymbol {\lambda })$ with a mean given by $\text {plim}_{n\to \infty }g(\boldsymbol {\hat {\lambda }})=g(\boldsymbol {\lambda })$. At the same time, the necessity to estimate the nonlinear restriction’s sampling variability can cause a mismatch between the Wald statistic’s asymptotic $\chi ^{2}$ distribution and its finite sample distribution which has considerable consequences for hypothesis testing (e.g., Lafontaine and White Reference Lafontaine and White1986; Phillips and Park Reference Phillips and Park1988).

### 3.2 The Wald Test of Common Factors in Spatial Models

The analytical results derived above are directly applicable to the empirical assessment of the common factor restriction in spatial regression models because the null hypothesis can be expressed as a nonlinear function of the estimates obtained from an unrestricted SDM model. In order to calculate the test statistic, it is necessary to determine the functional representation of the null hypothesis of common factors. Yet, there are numerous algebraically equivalent alternative parameterizations that satisfy $g(\boldsymbol {\hat {\lambda }})=0$, where $\boldsymbol {\hat {\lambda }}=[\hat {\rho },\hat {\beta },\hat {\theta }]$ are the estimates obtained from the SDM model in Equation (3).

Table 1, for example, lists the four alternative expressions of the null hypothesis considered by Gregory and Veall (Reference Gregory and Veall1986) in a time-series context. While all of the alternative statements declare the same restriction, they produce distinct test statistics and *p* values in finite samples because the approximation of the restriction’s sampling variability is based on the partial derivatives of the parameter estimates. Depending on the exact representation of the null hypothesis, the right part of Table 1 shows that the vector of partial derivatives obtained from these nonlinear functions differ.

As a result, alternative expressions of the common factor hypothesis use different estimators for the nonlinear restriction’s sampling variability which yields distinct test statistics and causes them to converge to the asymptotic $\chi ^{2}$ distribution at individual rates. In large samples, this circumstance is unproblematic as the accuracy of the Taylor series approximation increases in sample size while the contribution of the restriction’s estimated variability to the test statistic becomes negligible. In finite samples, however, the differences between the alternatives can be substantial (e.g., Gregory and Veall Reference Gregory and Veall1985). At worst, alternative and algebraically identical functional representations of the parameter restriction can indicate opposing conclusions regarding its validity despite the fact that the same model estimates are used to calculate the test statistic.

Importantly, while many statistical software packages used to estimate spatial regression models, like Stata or R packages, report results from a Wald test by default, they do not test for common factors. Instead, the null hypothesis these packages evaluate is cross-sectional independence, that is, $\hat {\rho }=0$. Since this is a linear restriction, the Wald statistic’s noninvariance problem does not arise and the tests these packages perform are only indicative of the presence of nonrandom spatial clustering. They do not permit any inferences regarding the spatial process at work. Hence, scrutinizing the common factor restriction requires researchers to amend the Wald test’s null hypothesis.

### 3.3 Modifications of the Wald Test

Since the application of the Taylor series expansion results in distinct Wald statistics for algebraically equivalent formulations of the null hypothesis, the asymptotic $\chi ^{2}$ distribution might constitute an inappropriate approximation of the statistic’s finite sample distribution for some parameterizations. Therefore, modifications proposed in the literature that attempt to address the Wald test’s noninvariance problem primarily focus on adjusting the statistic’s reference distribution.

Phillips and Park (Reference Phillips and Park1988), for example, show that an Edgeworth expansion of the Wald statistic provides additional information on the statistic’s distribution which can be used to obtain corrected critical values and modified test statistics for each functional representation of the null hypothesis (e.g., de Paula and Cribari-Neto Reference de Paula Ferrari and Cribari-Neto1993; King and Goh Reference King, Goh, Ullah, Wan and Chaturvedi2002). Besides these corrections, simulation techniques allow researchers to generate the empirical distribution under the null hypothesis for each specification of the common factor restriction and base inferences on these reference distributions (e.g., Lafontaine and White Reference Lafontaine and White1986; Goh and King Reference Goh and King1996). In particular, bootstrap methods provide a way to estimate critical values and use them as an alternative to the (corrected) asymptotic critical values which can be unreliable in finite samples (Godfrey and Veall Reference Godfrey and Veall1998). Using the following procedure, it is straightforward to derive bootstrap critical values to test for common factors in spatial regression models:

1. Use the estimates from an unrestricted SDM model and calculate the observed Wald statistic

*w*according to Equation (6) for a given restriction.2. Estimate the restricted SEM model. With these estimates and the DGP shown in Equation (1), generate 100 bootstrap samples by resampling with replacement from the residual vector to obtain bootstrap disturbances.

3. Repeat step 1 for each bootstrap sample and store the Wald statistic in vector $\boldsymbol {\tilde {w}}$.

4. Sort $\boldsymbol {\tilde {w}}$ in ascending order. The value with rank $(1-\alpha )\times 100+1$ is the estimated bootstrap critical value, $\chi ^{2}_{boot}$, corresponding to a predefined $\alpha $-level (e.g., $\alpha =0.05$).

By comparing *w* calculated in step 1 to the corresponding bootstrap critical value $\chi ^{2}_{boot}$ from step4, its statistical significance can be assessed. This procedure can be repeated for any functional representation of the common factor restriction in order to obtain individual bootstrap critical values for each restriction and a given region of the parameter space. Thereby, researchers can base inferences on the empirical distribution under the null hypothesis instead of relying on the asymptotic $\chi ^{2}$ distribution. This is especially important since the performance of the Wald test not only depends on the specific expression of the common factor hypothesis but also on the particular region in the parameter space. In fact, previous research shows that there is no single formulation of the restriction that consistently outperforms all alternatives (e.g., Gregory and Veall Reference Gregory and Veall1986; Lafontaine and White Reference Lafontaine and White1986; Phillips and Park Reference Phillips and Park1988).

While Goh and King (Reference Goh and King1996) demonstrate that both asymptotic modifications—the corrected critical values and the improved test statistics—might even deteriorate the Wald test’s power and size properties, they conclude that the bootstrap approach constitutes a useful improvement for applied research (see also Lafontaine and White Reference Lafontaine and White1986; Godfrey and Veall Reference Godfrey and Veall1998; King and Goh Reference King, Goh, Ullah, Wan and Chaturvedi2002). Of course, neither the Edgeworth corrections nor simulation techniques completely resolve the noninvariance problem inherent to the Wald test. Doing so requires the application of alternative tests such as the asymptotically equivalent LR test that is invariant to such reparameterizations (e.g., Mur and Angulo Reference Mur and Angulo2006). Yet, by providing corrections for the Wald test’s empirical size, these modifications reduce the possibility of intentionally manipulating the result by amending the functional expression of the null hypothesis (King and Goh Reference King, Goh, Ullah, Wan and Chaturvedi2002, 260).

## 4 Monte Carlo Analysis

### 4.1 Experimental Setup

In order to investigate the finite sample performance of the Wald test, I conduct Monte Carlo experiments in which I vary the sample size, the strength of the interdependence, and the severity of the omitted variables bias through the degree of correlation between the included and the omitted regressor. Using the spatial process depicted in Equation (2), I generate $1,000$ samples of the outcome vector $\boldsymbol {y}$ for each of the parameter configurations. In the simulations, $\beta = 2$ and $\sigma _{v}^{2}=1$ are held constant and $\boldsymbol {x}$ is drawn from a standard normal distribution. The parameter space of $\gamma $ ranges from $0$ to $1$ in steps of $0.2$ while $\rho $ takes on values between $0$ and $0.8$ in steps of $0.2$.Footnote ^{10} This setup includes a nonspatial DGP without omitted variables bias ($\rho =0$ and $\gamma =0$), nonspatial DGPs with omitted variables bias ($\rho =0$ and $\gamma>0$), SEM DGPs ($\rho>0$ and $\gamma =0$), and SDM DGPs ($\rho>0$ and $\gamma>0$). $\boldsymbol {W}$ is a row-stochastic contiguity matrix based on the queen criterion of adjacency. In contrast to the rook connectivity scheme which links spatial units ordered on a lattice to their direct horizontal and vertical neighbors, the queen criterion additionally connects the units to their diagonal neighbors (e.g., Cliff and Keith Ord Reference Cliff and Keith Ord1981).Footnote ^{11} The sample sizes specified here contain $49$, $100$, $225$, and $400$ observations distributed on regular grids ($7 \times 7$, $10 \times 10$, $ 15 \times 15$, and $20 \times 20$) to realistically reflect small to medium sized samples frequently encountered in political science.Footnote ^{12}

Since the consequences of model misspecification for the estimation of unbiased effect estimates have been studied elsewhere (e.g., LeSage and Pace Reference LeSage and Pace2009; Pace and LeSage Reference Pace, LeSage, Páez, Le Gallo, Buliung and Dall’erba2010; Lacombe and LeSage Reference Lacombe and LeSage2015; Rüttenauer Reference Rüttenauer2019), this Monte Carlo analysis focuses on the ability of the Wald test to identify the true spatial model and differentiate between substantive and residual dependence across a range of alternative DGPs.Footnote ^{13} To this end, I investigate the performance of the Wald test using the four alternative null hypotheses of common factors summarized in Table 1.Footnote ^{14}

### 4.2 Performance of the Original Wald Test

Table 2 reports the rejection rates of the four expressions of the null hypotheses of common factors at an $\alpha $-level of $0.05$ across the simulations when the true DGP is that of the SEM model ($\gamma =0$). Since there are no omitted spillovers in this scenario, the common factor restriction holds and the four variants of the Wald test are expected to reject the true null hypothesis in about $5\%$ of the simulation trials with a $95\%$ confidence interval of $[3.65\%; 6.35\%]$.

Based on the $\chi ^2$ distribution with $df=1$ and $\alpha = 0.05$, the asymptotic critical value used for all variants of the Wald test and across the different levels of spatial autocorrelation is $\chi ^2_{asym} = 3.841$. The theoretically expected rejection rate across the 1,000 simulation iterations is $5\%$ with a $95\%$ (binomial proportion) confidence interval of $[3.65\%; 6.35\%]$.

Although Section 3 analytically shows that the alternative Wald tests are asymptotically equivalent, their type I error rates differ notably in finite samples. Especially $H_{0}(II)$ but also $H_{0}(IV)$ deviate considerably from the expected error rate. Across all sample sizes, $H_{0}(II)$ is too conservative when $\rho $ is small. Since $\hat {\rho }$ appears in the restriction’s denominator (see Table 1), the restriction has no derivative at zero which violates the assumed continuity of derivatives. However, the Wald test based on $H_{0}(II)$ remains valid as its asymptotic distribution is obtained under the null hypothesis which precludes the problematic value (Gregory and Veall Reference Gregory and Veall1985).Footnote ^{15}

At the same time, incorrectly rejecting a true null hypothesis might be less problematic in this case since the SDM model derives unbiased impact estimates even if only the residuals are spatially clustered and no substantive spillovers exist (e.g., Elhorst Reference Elhorst2010). The only drawback is that the appropriate SEM specification would be more efficient which might affect inferences regarding the statistical significance of a regressor’s impact. Consequently, it is crucial for any test of the common factor hypothesis to have satisfactory power properties in order to reduce concerns about biased effect estimates.

Against this background, Figure 1 compares the performance of the alternative variants of the Wald test for different levels of correlation between the spatially dependent omitted variable and the included regressor by reporting their power. As the correlation increases, the tests should be more likely to reject the null hypothesis. In order to account for the effects of the sample size and the strength of the interdependence on the performance of the tests, Figure 1 is comprised of 16 panels. In each panel, the horizontal axis depicts the different values of $\gamma $ and the vertical axis shows the observed share of rejections across the simulation trials.

A brief inspection of Figure 1 already confirms that alternative parameterizations of the null hypothesis—although algebraically equivalent—yield strikingly different results in finite samples. Even with a decently sized sample, there are pronounced differences in the rejection rates of the four Wald tests. While $H_{0}(I)$, which is considered to be the common way to express the restriction (Gregory and Veall Reference Gregory and Veall1986, 204), and $H_{0}(III)$ perform comparatively well in these simulations, the specifications based on $H_{0}(II)$ and $H_{0}(IV)$ have inferior power properties. The remarkably low rejection rates of these specifications of the Wald test increase the likelihood that researchers incorrectly infer the absence of meaningful spillover effects.

Moreover, the behavior of $H_{0}(IV)$ differs greatly from the expectation as its rejection rate initially decreases in almost all parameter settings as $\gamma $ increases. This phenomenon—known as nonmonotonicity in the power function—makes the rejection of the null hypothesis even less likely as the difference between the true DGP and the restriction increases (King and Goh Reference King, Goh, Ullah, Wan and Chaturvedi2002, 256–58).Footnote ^{16} In practice, these tests would suggest that the data was generated by a DGP with spatially correlated errors even if there are sizable spillover effects. Researchers would incorrectly conclude that a SEM model or even a nonspatial OLS model appropriately represents the unobservable DGP. Given that these model specifications produce biased impact estimates if a SDM process generated the data, the low rejection rates are highly problematic for substantive inferences.

Although the different variants of the Wald test use the same data and identical parameter estimates, this simulation study shows that, depending on the functional representation of the null hypothesis, they can come to contradictory conclusions regarding the validity of the common factor restriction.Footnote ^{17} In fact, Breusch and Schmidt (Reference Breusch and Schmidt1988) analytically show that it is possible to obtain any desired Wald statistic by appropriately specifying the restriction which opens up the possibility to intentionally manipulate the test result (see also King and Goh Reference King, Goh, Ullah, Wan and Chaturvedi2002). Therefore, any search strategy utilizing the Wald test, like the basic *general-to-specific* approach or the multistep procedure suggested by Elhorst (Reference Elhorst2014a), is subject to this malfunctioning. Since there is no theoretically justified functional representation of the common factor hypothesis and given the strikingly large share of inconsistent inferences across a range of parameter settings, the evidence presented here strongly caution against the use of the standard Wald test based on an asymptotic reference distribution.

### 4.3 Performance of the Modified Wald Test Based on Bootstrap Critical Values

While the simulations performed here illustrate that the standard Wald test based on asymptotic critical values is unreliable for the identification of the unobservable spatial process, this section investigates whether the application of simulated reference distributions improves the test’s performance. To this end, I use the bootstrap procedure outlined in Section 3.3 and compare the observed Wald statistics based on the different formulations of the common factor restriction to their estimated critical values.

The results reported in Table 3 show that the estimated critical values from the bootstrap approach, $\chi ^{2}_{boot}$, displayed in parentheses not only differ from their asymptotic counterpart $\chi ^{2}_{asym} = 3.841$ on which the original Wald test is based. They also reveal sizable discrepancies between the alternative parameterizations of the null hypothesis. While the estimated critical values for $H_{0}(I)$ and $H_{0}(III)$ are always higher than $\chi ^{2}_{asym}$, the simulated null distributions of $H_{0}(II)$ suggest much smaller critical values for this expression of the null hypothesis in most scenarios.

For the different levels of spatial autocorrelation, the median bootstrap critical values $\chi ^{2}_{boot}$ for each variant of the Wald test at the nominal significance level of $5\%$ are displayed in parentheses. Again, the theoretically expected rejection rate across the simulation trials is $5\%\; [3.65\%; 6.35\%]$.

Since the functional expression of the nonlinear common factor restriction determines the Wald statistic’s rate of convergence to the asymptotic $\chi ^{2}$ distribution, estimating critical values for each alternative parameterization improves the empirical size of the Wald test in finite samples. Compared to the original tests based on the asymptotically derived critical value, the observed rejection rates of each of the four variants of the Wald test is closer to the nominal significance level of $5\%$ across all sample sizes. Whereas the observed rejection rate of $H_{0}(I)$ ranges from $6.9\%$ to $11\%$ across the different values of $\rho $ for a sample size of $n=49$ when relying on the asymptotic $\chi ^{2}$ distribution (see Table 2), its corresponding range is narrowed to 5.1%–5.8% when using bootstrap critical values. Similarly, the bootstrap critical values even improve the size of $H_{0}(IV)$ which performed poorly under the asymptotic reference distribution. For $n=49$, basing inferences on the simulated null distribution narrows the range of rejection rates from 2.6%–11.5% to 4.1%–6.8% across the different levels of spatial autocorrelation.

In conclusion, the Monte Carlo evidence presented here demonstrate that using the simulated null distribution as a reference distribution and basing inferences on estimated rather than asymptotically derived critical values ameliorates the problems posed by the Wald test’s lack of invariance to alternative parameterizations of the common factor hypothesis.Footnote ^{18} Since the bootstrap critical values account for differences in the convergence rates of the Wald statistics, this modification constitutes a superior alternative that facilitates the empirical assessment of the common factor restriction in spatial regression models. Alternatively, the LR test constitutes another option that is invariant to such reparameterizations (Godfrey and Veall Reference Godfrey and Veall1998).Footnote ^{19} Hence, irrespective of the empirical model search strategy employed, researchers should utilize the modified Wald test based on the simulated null distribution or the LR test in order to empirically evaluate the appropriateness of the spatial model employed.

## 5 Empirical Example: Spatial Contagion Effects in Economic Voting

An empirical example helps to demonstrate the consequences of the problem for applied research aiming to evaluate the empirical evidence for a theorized mechanism while ruling out alternative mechanisms. To this end, I reanalyze a study conducted by Williams and Whitten (Reference Williams and Whitten2015) that investigates spatial contagion effects, understood as the process by which “[…] a policy success or failure of one political party in the eyes of voters similarly affects those parties that are ideologically proximate” (Williams and Whitten Reference Williams and Whitten2015, 312). The utilization of different sample sizes and the availability of a plausible alternative mechanism make this study an ideal case to investigate the consequences of the Wald test’s lack of invariance to reformulations of the common factor hypothesis.

Williams and Whitten (Reference Williams and Whitten2015) argue that the electorate not only rewards or punishes the parties forming the current government for the country’s economic performance at the ballot box as predicted by the economic voting hypothesis. Since voters group parties based on their ideological stances, the effect of economic prosperity also spills over to ideologically proximate parties irrespective of whether or not these parties also belong to the government. These indirect effects conjectured by the authors link the economic wellbeing of a country to the electoral performance of opposition parties. Therefore, the study contributes to the literature on electoral competition by combining insights from the hitherto separated literatures on economic voting and spatial party competition.

To assess the empirical support for the proposed mechanism, Williams and Whitten (Reference Williams and Whitten2015, 315–16) analyze data on electoral contests in 23 parliamentary democracies from 1951 to 2005, where the parties constitute the unit of analysis. The change in a party’s vote share between two consecutive elections is the dependent variable and the country’s economic performance, measured by the real GDP per capita growth, is the main regressor of interest.Footnote ^{20} In their study, the authors emphasize the importance of spatial regression models which facilitate the estimation of the theoretically expected contagion effects in the form of spatial spillovers. They choose the SAR model specification in order to quantify (global) spillover effects (Williams and Whitten Reference Williams and Whitten2015, 313–14). In line with the economic voting literature suggesting that the voters’ ability to clearly attribute the responsibility for the economic (mis)fortune is a necessary precondition for economic voting to occur, they estimate separate SAR models for elections with high ($n=398$) and low levels of clarity ($n=1,030$). While economic voting itself should be less pronounced in elections characterized by a low clarity of responsibility because it is harder for voters to hold a party accountable for the country’s economic performance, the authors expect to find larger spatial contagion effects in this context. The argument proposed in the study is that in low clarity settings, the electorate is more experienced in switching their support from one party to an ideologically similar party. Therefore, the voters’ sophistication in terms of reallocating their support creates stronger interdependencies between parties in low clarity elections as compared to high clarity settings where voters can easily identify the party who is responsible for the national state of the economy (Williams and Whitten Reference Williams and Whitten2015, 312–13).

Given these theoretical expectations, the SAR model constitutes an appropriate choice as it links the electoral fortune of a party to the performance of the other parties and allows to distinguish between direct and indirect effects of economic prosperity on the parties’ vote shares. Yet, while unfocused diagnostics, like Moran’s *I*, indicate the existence of spatial interdependencies, it is possible that an alternative spatial process caused the clustering detected in the data.Footnote ^{21} Unmodeled election specific particularities, for example, that are unrelated to a country’s economic performance—like the general appeal of a candidate or political scandals—might affect the election outcome of ideologically proximate parties as well. Since these factors are not part of the regression’s systematic part, they potentially cause spatial clustering in the residuals. Instead of substantively meaningful contagion effects, this plausible alternative process implies no indirect effect of economic performance but a mere diffusion of shocks which would be adequately captured by the SEM model. Consequently, there is a risk that the SAR models specified by the authors lead to incorrect inferences regarding the existence of contagion effects.

To demonstrate the substantive differences between the two alternative spatial processes, Figure 2 displays the estimated direct and indirect effects of the main regressor of interest—economic performance—on the vote share of opposition parties derived from the SAR, SEM, and SDM model.Footnote ^{22} As the theory suggests, spatial contagion effects should mitigate the negative effect of a strong economy for opposition parties. This is because the beneficial effect of positive economic conditions for governing parties spills over to ideologically neighboring opposition parties. In contrast, if merely the errors are spatially correlated, no contagion takes place and only a direct negative effect of a country’s economic performance exists for opposition parties.

Despite a significant spatial parameter estimate, Figure 2 illustrates that the alternative spatial models suggest no indirect spillover effect of economic growth in low clarity elections.Footnote ^{23} In high clarity elections, only the SAR model identifies significant spillover effects. While the SEM model assumes no spillovers, the average indirect impact of economic growth on the change in vote share for opposition parties as estimated by the SAR model is $0.020$ with a simulated $95\%$ confidence interval within $[0.004; 0.045]$.Footnote ^{24} In contrast, the estimate derived from the SDM specification is $0.025\; [-0.208; 0.244]$, suggesting no significant spillovers. Besides the SDM model’s remarkable efficiency loss, this example illustrates that the identification of the theorized contagion effects is contingent on the specification of the underlying spatial process. Although the SAR and SEM models produce similar and statistically indistinguishable total impact estimates of economic performance in high clarity elections, the results have very different theoretical implications.Footnote ^{25} Notably, the overall impact of economic growth on an opposition party’s vote share in the SEM model solely consists of the direct impact of $x_{i}$ on $y_{i}$. In contrast, the SAR model also identifies significant indirect impacts. Therefore, while the SAR model supports the theory of spatial contagion effects, there are no substantive spillover effects in the SEM model which highlights the importance of adequately distinguishing between these alternative processes for substantive inferences.

In order to address the problem of model misspecification and to empirically distinguish the two plausible spatial processes, I implement the Wald test of common factors by using the SDM model estimates and the four specifications of the common factor restriction outlined in Table1. If the data supports the theory of indirect contagion effects, the tests should reject the null hypothesis. Yet, Table 4 illustrates that in both high and low clarity contexts, the alternative Wald tests not only differ in their test statistics. Based on the asymptotic critical values, they also come to substantively different conclusions regarding the existence of the spillover effects. While $H_{0}(IV)$ supports the theory proposed by Williams and Whitten (Reference Williams and Whitten2015), the other three alternative versions of the Wald test fail to reject the common factor hypothesis. Instead of substantively meaningful spillovers, these tests only indicate residual dependence which implies that no spatial contagion takes place among the parties. Given the rather large number of observations in the low clarity scenario, these differences become even more alarming. Alternatively, when inferences regarding the underlying spatial process are based on the simulated null distribution of each parameterization of the common factor restriction, all four variants of the Wald test fail to reject the null hypothesis at conventional significance levels.

*w* is the observed Wald statistic. $p_{asym}$ and $p_{boot}$ denote *p* values based on asymptotic critical values $\chi ^{2}_{asym}$ and on bootstrap critical values $\chi ^{2}_{boot}$ respectively. While each restriction has an individual simulated critical value, they share a single asymptotic critical value which depends on the $\alpha $-level and the number of degrees of freedom.

Taken together, this empirical case study confirms the Monte Carlo evidence by demonstrating that relying on bootstrap critical values in order to identify statistically significant deviations from the Wald statistic’s null distribution improves its finite sample performance and alleviates the conflict between alternative parameterizations of the nonlinear common factor hypothesis. While the tests based on the asymptotic $\chi ^{2}$ distribution come to contradictory conclusions even with a sample size of more than $1,000$ observations, the bootstrap procedure is able to correct for this undesirable circumstance. Regarding the theorized mechanism, this analysis finds insufficient evidence to convincingly dispel doubts that, instead of the theorized spatial contagion effects, correlation in the residuals caused the spatial clustering found in the data.

## 6 Conclusion

Distinguishing substantively meaningful indirect spillover effects from a mere diffusion of random shocks is essential as there is a serious risk of making incorrect inferences when estimating a misspecified model. Yet, the task of appropriately modeling the process underlying observable patterns of interrelatedness between the units poses notable difficulties for political scientists. Although many empirical specification search procedures rely on the Wald test to assess the nonlinear common factor restriction, the test’s lack of invariance to algebraically equivalent formulations of the null hypothesis poses a serious problem for the accuracy of inferences.

This study investigates the consequences of the Wald test’s sensitivity to alternative and algebraically equivalent expressions of the common factor hypothesis for its ability to guide the empirical model specification search. By presenting analytical evidence and using Monte Carlo simulations as well as an empirical example, it shows that the necessity to approximate the sampling variability of a nonlinear function by a Taylor series expansion causes the Wald test’s sensitivity to algebraically equivalent reparameterizations of the null hypothesis. While asymptotically valid, this approximation produces considerable differences in finite samples, depending on the restriction’s functional representation. In many instances, alternative null hypotheses even suggest contradictory conclusions regarding the underlying spatial process since they converge to the Wald statistic’s asymptotic $\chi ^{2}$ distribution at different rates. Given that there is no theoretical justification for any particular expression and since their performance is contingent on the relevant region of the parameter space, the results caution against relying on the Wald test’s asymptotic results in any specification search strategy. Instead, practitioners should either base inferences on a simulated null distribution by estimating bootstrap critical values or turn to the LR test which is invariant to such reparameterizations in order to avoid spurious inferences.

Subsequent research might continue this line of research by developing more reliable strategies that help practitioners to differentiate between substantive and residual dependence. As Mur and Angulo (Reference Mur and Angulo2009) show, the evidence in favor of any search strategy proposed in the literature is mixed which explains the debate about the most appropriate strategy and prevents the development of general guidelines for the empirical identification of the correct model specification (Rüttenauer Reference Rüttenauer2019, 16). In this regard, Lacombe and LeSage (Reference Lacombe and LeSage2015), for example, demonstrate that Bayesian methods constitute a promising alternative to the frequentist null hypothesis significance testing approach. Additionally, multimodel inference might help overcoming the current fixation on model selection and instead allows researchers to focus on the identification of substantively meaningful spillover effects in the data (see also Juhl Reference Juhl2020b). Especially regarding the considerable difficulties researchers face when attempting to empirically distinguish between different spatial processes (e.g., Gibbons and Overman Reference Gibbons and Overman2012), following this line of investigation will enhance model building and contribute to our understanding of different interaction effects among the units of analysis.

Spatial autocorrelation poses notable challenges for the correct specification and interpretation of statistical models as model misspecification can bias the substantive inferences. Notwithstanding these difficulties, interdependencies are paramount in social science theories which obliges researchers to carefully consider the process generating these dependencies when building empirical models in order to make valid inferences with respect to the theories. Consequently, especially in the absence of design-based identification strategies as proposed by Gibbons and Overman (Reference Gibbons and Overman2012), methodological research facilitating the appropriate specification of spatial models constitutes an important contribution for a thorough assessment of theoretical expectations.

## Funding

This research was funded by the German Research Foundation (DFG)—Project-ID 139943784—SFB 884. I also gratefully acknowledge support by the state of Baden-Württemberg through the High Performance Computing Cluster bwHPC (INST 35/1134-1 FUGG) and the University of Mannheim’s Graduate School of Economic and Social Sciences.

## Acknowledgments

This project has been presented at the EPSA conference 2019 in Belfast, Northern Ireland. I would like to thank Lion Behrens, Thomas Bräuninger, Thomas Plümper, Akisato Suzuki, Garrett Vande Kamp, and Laron K. Williams, the participants of the 2019 CDSS Political Science Colloquium at the University of Mannheim, and four anonymous reviewers as well as the journal’s editor Jeff Gill for helpful comments.

## Data Availability Statement

Replication code for this article has been published in Code Ocean, a computational reproducibility platform that enables users to run the code, and can be viewed interactively at https://doi.org/10.24433/CO.1459046.v1. A preservation copy of the same code and data can also be accessed via Dataverse at Juhl (Reference Juhl2020a).

## Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2020.23.