The Wald Test of Common Factors in Spatial Model Specification Search Strategies

Abstract Distinguishing substantively meaningful spillover effects from correlated residuals is of great importance in cross-sectional studies. Both forms of spatial dependence not only hold different implications for the choice of an unbiased estimator but also for the validity of inferences. To guide model specification, different empirical strategies involve the estimation of an unrestricted spatial Durbin model and subsequently use the Wald test to scrutinize the nonlinear restriction of common factors implied by pure error dependence. However, the Wald test’s sensitivity to algebraically equivalent formulations of the null hypothesis receives scant attention in the context of cross-sectional analyses. This article shows analytically that the noninvariance of the Wald test to such reparameterizations stems from the application of a Taylor series expansion to approximate the restriction’s sampling distribution. While asymptotically valid, Monte Carlo simulations reveal that alternative formulations of the common factor restriction frequently produce conflicting conclusions in finite samples. An empirical example illustrates the substantive implications of this problem. Consequently, researchers should either base inferences on bootstrap critical values for the Wald statistic or use the likelihood ratio test which is invariant to such reparameterizations when deciding on the model specification that adequately reflects the spatial process generating the data.


Motivation
The correct specification of the inherently unknown spatial process generating observable patterns of interrelatedness among the units of analysis constitutes a considerable challenge in crosssectional studies. In particular, distinguishing substantively meaningful indirect spillover effects from spatially correlated random shocks is imperative as there is a serious risk of making incorrect inferences when estimating a misspecified model (e.g., LeSage and Pace ; Darmofal ). Unfortunately, while unfocused tests for spatial autocorrelation commonly applied in empirical research, like Moran's I (e.g., Cliff and Keith Ord ), help to detect spatial clustering in model residuals, they do not provide guidance on the exact process generating these dependencies. Since spatial regression models differ with respect to the implied pathways of dependence, these simple diagnostic tools do not allow researchers to identify the adequate model specification.
To address this specification problem, many empirical model selection strategies proposed in the literature feature the estimation of the spatial Durbin model (SDM) as an unrestricted nesting model and utilize the Wald test to scrutinize nonlinear common factor restrictions implied by pure error dependence (e.g., Burridge ). Since the Wald test is asymptotically equivalent to the likelihood ratio (LR) and the Lagrange multiplier (LM) tests, which are the two alternative likelihoodbased specification tests, the choice of a test statistic is o entimes motivated by convenience or familiarity (e.g., LeSage and Pace , ). However, while previous studies in the field of spatial There is an ongoing debate in the spatial econometrics literature whether the specific-to-general or the general-to-specific approach should be used in order to identify the true data-generating model (e.g., Florax et al. ; Florax et al. ; Hendry ; Elhorst a). However, both approaches have their disadvantages and there is no conclusive evidence for the superiority of any of these search strategies (Mur and Angulo ; Rüttenauer ). Consequently, many search procedures rely on a combination of both approaches (e.g., Mur and Angulo ; Elhorst ; Elhorst a). econometrics report notable differences with respect to their finite sample properties (e.g., Mur and Angulo ; Mur and Angulo ), the Wald test's sensitivity to algebraically equivalent alternative formulations of the null hypothesis is somewhat overlooked. Given that this result is well-established in a time-series context (e.g., Gregory and Veall ; Lafontaine and White ; Breusch and Schmidt ; Dagenais and Dufour ; de Paula and Cribari-Neto ; Goh and King ) and regarding the importance of distinguishing spillover effects from residual correlation for substantive inferences, this negligence is startling.
By remedying this omission, the present study evaluates the Wald test's appropriateness for differentiating between alternative mechanisms that cause spatial clustering in cross-sectional data structures. It discusses the substantive and econometric implications of alternative spatial processes and shows analytically that the Wald test's lack of invariance to reparameterizations of nonlinear common factor restrictions stems from the necessity to approximate the restrictions' sampling distributions. While asymptotically valid, Monte Carlo experiments demonstrate that this approximation frequently leads to misleading inferences concerning the presence of spillover effects across a wide range of parameter settings in finite samples. An empirical example further illustrates the severity of this problem for applied research aiming to assess the support for distinct theoretical mechanisms against possible alternative explanations. Given that a misspecification of the process generating cross-sectional dependencies can bias substantive inferences, the results suggest that, irrespective of the specification search strategy employed, researchers should not base inferences on the Wald statistic's asymptotic χ 2 distribution. Instead, simulation techniques such as bootstrap methods allow researchers to use estimated critical values as an alternative to their asymptotic counterparts. The LR test also offers a valuable alternative procedure that is invariant to reparameterizations of the null hypothesis.

Substantive and Residual Dependence in Cross-Sectional Models
In regression analyses utilizing cross-sectional data, three different types of interaction effects can be distinguished that generate spatial autocorrelation in the dependent variable. First, endogenous interaction effects occur whenever the units' outcomes are intertwined. In these situations, the actions, decisions, or behaviors of the units are simultaneously determined and responsive to the other units' outcomes. Second, exogenous interaction effects cause spatial clustering by linking the response of each unit to the covariates of other units. Finally, cross-sectional dependencies can be a product of spatially correlated model residuals (e.g., Elhorst b; Halleck Vega and Elhorst ). While endogenous and exogenous interaction effects are part of the regression's systematic component, correlation among the error terms is confined to the model residuals and does not affect the expectation of the outcome conditional on the regressors.
Despite their close correspondence (see e.g., Gibbons and Overman ), the distinction between the different mechanisms causing spatial dependencies in the data has far-reaching implications for the estimation and interpretation of the regression coefficients (LeSage and Pace ; Rüttenauer ). Importantly, substantively meaningful indirect spillover effects, loosely defined as the impact of changes in one unit's covariates on the other units' outcomes, only exist if cross-unit interactions are part of the regression's systematic component (Elhorst ; Darmofal ; Halleck Vega and Elhorst ). In these instances, the cross-partial derivative of unit i's outcome y i with respect to unit j's covariate x j is nonzero, signifying a systematic relationship between the units (e.g., LeSage and Pace , ). Otherwise, the regression model imposes the restriction of no spillovers by assumption. To detect the existence of these spillover effects, different model specification search strategies suggest to utilize the unrestricted SDM model as a By virtue of the Gauss-Markov assumptions, nonspatial regression models typically estimated by ordinary least squares (OLS) do not incorporate any spatial effects. In a regression model with a sigle regressor x , the direct effect of a change in x i for unit i on the unit's outcome y i is ∂E (y i |x i )/∂x i =β O LS = (x ′ x ) −1 x ′ y while this change is ∂E (y j |x i )/∂x i = 0 for all general model featuring substantive as well as residual dependence and subsequently test several parameter restrictions (e.g., Mur and Angulo ; LeSage and Pace ; Elhorst ).

. An Illustrative Example of the Different Spatial Processes
Before outlining the alternative spatial model specifications, it is useful to contrast the different spatial processes with respect to their substantive implications for empirical political science research. Spillover effects occur whenever the behavior (endogenous interactions) or certain characteristics (exogenous interactions) of one unit-may this be a country, a (coalition) government, a political party, or any other entity of interest-affect adjacent units (Darmofal , ). For example, the municipalities' income tax revenues are directly related to their local economic performance. At the same time, the economic prosperity of its neighbors also exerts a positive impact on a municipality's income tax revenues as its residents might commute to work. These dependencies in income tax revenues produce spillovers between the municipalities: the effect of a change in one unit's characteristics propagates to its neighbors (e.g., LeSage and Pace ). Hence, adequately understanding the phenomenon of interest-cross-sectional variation in income tax revenuesnecessitates the consideration of these exogenous interaction effects among the municipalities that generate substantively meaningful and theoretically relevant indirect spillover effects.
In contrast, cross-sectional dependencies in the disturbances constitute another spatial process that has different substantive implications and requires an alternative model specification. Several circumstances can cause correlation between the units' residuals, for example spatial clustering in measurement errors. Alternatively, omitting a spatially dependent explanatory variable that is part of the true data-generating process (DGP) from the regression equation leads to correlated errors (Elhorst b; Darmofal ). With regards to the example given above, it is reasonable to expect that several unobservable but spatially dependent characteristics, like a region's general appeal as a place of residence, affect a municipality's revenue from income taxation as well. In contrast to the theorized exogenous interaction effects, these omitted characteristics merely affect the model residuals and there are no relevant indirect effects present in the process that generates the data. Since exactly modeling the true dependence structure is almost impossible (e.g., Juhl b), omitting relevant variables that are spatially correlated can create linkages among the units' disturbances.
. Common Factors in the Spatial Durbin Model While the previous example illustrates the crucial differences in the substantive implications of the underlying process that links the observations to one another, model misspecification may cause severe econometric problems as well. In fact, neglecting cross-sectional interdependencies can induce correlation between the regressors and the residuals, resulting in the canonical endogeneity problem (e.g., Gibbons and Overman ; Betz, Cook, and Hollenbach ). In order to illustrate the problem of omitted spillover effects, consider the stylized DGP in which a dependent variable y is entirely determined by two uncorrelated regressors, denoted x and z , such that y = x β + z . Assume that z is unobservable and follows a spatial autoregressive process such that z = ρW z + u 1 , where ρ is a scalar parameter, W is an exogenously defined connectivity matrix, and u 1 is a vector of independent and identically distributed normal disturbances with zero units j where j i . By the same token, the spatial error model (SEM) specification does not feature spillover effects since residual dependence does not affect E (y i |x i ). Different theoretical mechanisms can produce substantively meaningful spillover effects. Shipan and Volden ( ), for example, distinguish between four different mechanisms of policy diffusion: learning, economic competition, imitation, and coercion. Acknowledging this, I leave aside a thorough discussion of alternative mechanisms and restrict the focus to the empirical modeling of cross-sectional dependencies. For the sake of simplicity, I set the coefficient associated with z to and omit it from the equation. mean and a fixed variance. This scenario leads to the spatial error model (SEM) specification: ( ) Due to the uncorrelatedness of x and z , omitting the spatially autocorrelated variable does not lead to endogeneity concerns. Furthermore, the relationship depicted in Equation ( ) implies no indirect spillover effects since the cross-unit interactions are confined to the residuals. Consequently, even a nonspatial OLS model specification would provide unbiased but inefficient parameter estimates (Lacombe and LeSage ). Now consider a slightly modified scenario in which the included and the omitted regressors are no longer independent from one another but correlated. To induce correlation between the variables, suppose that the random variable u 1 in Equation ( ) is replaced by u 2 which is an additive linear function of x and a stochastic disturbance term v such that u 2 = x γ + v . In addition to the spatial autocorrelation, the unobserved covariate z is now also correlated with x and the scalar γ ∈ (0, 1] as well as the dispersion of v (σ 2 v ) jointly determine the strength of the correlation. In this slightly modified scenario, the true DGP becomes ( ) In contrast to the DGP shown in Equation ( ), the correlation between the included regressor x and the spatially clustered variable z causes an endogeneity problem if z is omitted from the regression's systematic part. Importantly, the effect of a change in regressor x on the outcome y in Equation ( ) is more complex and both a nonspatial OLS model as well as a SEM model do not yield unbiased estimates since they ignore indirect effects produced by the spatial patterning of the correlated omitted variable. Consequently, the effect of a change in x i is not confined to unit i's outcome y i but rather pertains to all nonisolated units in the entire system through indirect spillover and instantaneous feedback effects (e.g., LeSage and Pace ; Betz et al. ). In order to address the endogeneity problem and to identify meaningful spillover effects, the unrestricted SDM model plays an important role. In fact, it serves as a general nesting model in many specification search strategies since it comprises several simpler spatial regression models frequently employed in empirical studies (e.g., Mur and Angulo ; Elhorst ; Angulo and Mur ). By allowing researchers to test different parameter restrictions, the SDM model facilitates the specification of an econometric model that reflects the spatial process generating the data most appropriately. For the hypothetical scenario with one regressor x and the (possibly correlated) unobserved variable z discussed here, the SDM model to be estimated takes the following form: While it is easy to verify that the SDM model reduces to the popular spatial autoregressive (SAR) model, also known as spatial lag model (e.g., Elhorst b, ), that features global spillover effects Also assume that ρ is contained in the compact open interval (ω −1 mi n ; ω −1 max ), where ω mi n and ω max are the smallest and largest eigenvalues of W . This stability constraint ensures that the matrix (I n − ρW ) is positive definite and its inverse exists (e.g., LeSage and Pace ; Elhorst b). While the general nesting spatial (GNS) model incorporates all possible types of cross-sectional interaction effects, it tends to be overparameterized. Hence, it provides no additional information and is rarely used in applied studies (e.g., Elhorst b; Rüttenauer ).
if θ = 0 and to the spatial lag of X (SLX) model featuring local spillovers when ρ = 0, it also subsumes the SEM model that rules out any substantive indirect effects by assumption. To illuminate the relationship between these models, it is useful to restate the SEM DGP displayed in Equation ( ) by multiplying both sides of the equation by (I n − ρW ) and rearranging terms which results in the following structural form (Burridge ; Anselin ): ) elucidates that the SEM process imposes a nonlinear common factor restriction on the parameter associated with W x which can be assessed by using estimates from the unrestricted SDM model depicted in Equation ( ). More precisely, if the estimates from the unrestricted SDM model satisfy the common factor restriction θ = −ρβ , the model can be simplified to a SEM model specification because there are no discernible substantive spillover effects in the data. Given that E (β ) − β = 0 in Equation ( ), the common factor restriction holds for SEM processes.
However, if x and z are correlated in the true DGP, an endogeneity problem occurs since the variable z is unobserved and the common factor restriction no longer holds. Restating the DGP in Equation ( ) that features this correlation in a similar fashion yields: ( ) Given the discussion above, Equation ( ) illustrates that, in the presence of an omitted variable that is (i) spatially clustered and (ii) correlated with an included regressor, the true DGP resembles the SDM specification. Although the SDM model shown in Equation ( ) provides consistent estimates for E (ρ) = ρ and E (θ) = −ρβ , the estimate of β is asymptotically biased since E (β ) = β + γ. Based on these model estimates, the common factor restriction does not hold because of the endogeneity bias present in E (β ). Hence, a violation of the common factor restriction is indicative of the existence of indirect spillover effects that need to be included in the systematic part of the regression model.
As this discussion suggests, the spatially lagged exogenous variables in the SDM model specification can also be understood as instruments for omitted variables that are correlated with included regressors (e.g., Elhorst b, ). At the same time, as Gibbons and Overman ( ) emphasize, this strategy provides only weak identification of the model parameters and crucially depends on the exogeneity and the assumed perfect knowledge of W . Consequently, the SDM model does not provide a general solution to the omitted variables problem that would allow researchers to identify causal effects. Just like in any other observational study, doing so requires the application of appropriate research designs (see also Betz et al. ; Rüttenauer ). Moreover, while the SDM specification features global spillover effects, local spillovers would require a different model specification like the spatial Durbin error model (e.g., Halleck Vega and Elhorst ).
Since this study is primarily concerned with the Wald test's ability to assess the common factor hypothesis, the reader may be referred to Halleck Vega and Elhorst ( ), Elhorst ( b), Elhorst , or LeSage and Pace ( ) who provide excellent treatments of alternative spatial regression models. The number of these common factor restrictions equals the number of regressors included in the model. To ease the exposition, I assume a single regressor throughout this study.

The Wald Test of Nonlinear Restrictions
Given the estimates from an unrestricted model, the Wald test is flexible enough to scrutinize several different and possibly nonlinear parameter constraints within the same model. Intuitively, the Wald test assesses the distance between the observed estimates and the restrictions imposed. As the distance grows, the restrictions become less likely. In contrast to the LM and the LR tests which constitute prominent and asymptotically equivalent alternative specification tests, the Wald test does not require the estimation of restricted alternative models (e.g., Burridge ). Despite its advantages, a major drawback of the Wald test of nonlinear restrictions is its lack of invariance to algebraically equivalent expressions of the null hypothesis. Since the asymptotic distribution of a nonlinear restriction needs to be approximated by a Taylor series expansion, seemingly identical functional representations may produce different test statistics. Although this undesirable property is a well-known feature of the Wald test (e.g., Gregory and Veall ; Lafontaine and White ; Breusch and Schmidt ; Dagenais and Dufour ; de Paula and Cribari-Neto ; Goh and King ), its consequences for empirical spatial model search strategies have been neglected so far.

. Analytical Derivation and Asymptotic Distribution of the Wald Statistic
Consider a situation in which a test needs to be constructed in order to evaluate a single nonlinear restriction H 0 : g (λ) = 0, where λ is a parameter vector and g (·) is some function that is continuously differentiable in a neighborhood of λ. For this general case, the Wald statistic is defined by . Under the null hypothesis, w asymptotically follows a χ 2 distribution with the number of degrees of freedom equal to the number of restrictions imposed. The only complication involved here is that obtaining w necessitates knowledge about the sampling distribution of a nonlinear function. While it is straightforward to compute the value of g (λ) at the parameter estimates, deriving the Wald statistic in Equation ( ) additionally requires information about its variability which depends on the estimatorλ and the restriction g (·). However, given the restriction's nonlinearity, exact distributional results become inapplicable (Greene ). Instead, the delta method provides an approximation of the restriction's asymptotic distribution. It is based on a first-order Taylor series expansion of g (λ) around the true parameter vector λ. Assuming thatλ is a consistent estimator with a limiting distribution defined and that the standard regularity conditions hold (see e.g., Newey and McFadden ), the delta method implies that where G(λ) = ∂g (λ)/∂λ ′ is a row vector of partial derivatives. It follows that the asymptotic distribution of the restriction under H 0 is g (λ) a ∼ N g (λ), G(λ)n −1 Σ G(λ) ′ . By using the consistent estimates obtained from an unrestricted model, the restriction's sampling variability derived from the delta method is given by  Algebraically identical formulations of the common factor hypothesis.
with G(λ) evaluated atλ and Σ = n −1 Σ being a consistent estimator of the symmetric, positive definite asymptotic variance-covariance matrix. By substituting Equation ( ) into Equation ( ), the Wald test statistic can be calculated. Asymptotically, since plim n→∞λ = λ, the function g (λ) converges in distribution to g (λ) with a mean given by plim n→∞ g (λ) = g (λ). At the same time, the necessity to estimate the nonlinear restriction's sampling variability can cause a mismatch between the Wald statistic's asymptotic χ 2 distribution and its finite sample distribution which has considerable consequences for hypothesis testing (e.g., Lafontaine and White ; Phillips and Park ).

. The Wald Test of Common Factors in Spatial Models
The analytical results derived above are directly applicable to the empirical assessment of the common factor restriction in spatial regression models because the null hypothesis can be expressed as a nonlinear function of the estimates obtained from an unrestricted SDM model. In order to calculate the test statistic, it is necessary to determine the functional representation of the null hypothesis of common factors. Yet, there are numerous algebraically equivalent alternative parameterizations that satisfy g (λ) = 0, whereλ = [ρ,β ,θ] are the estimates obtained from the SDM model in Equation ( ). Table , for example, lists the four alternative expressions of the null hypothesis considered by Gregory and Veall ( ) in a time-series context. While all of the alternative statements declare the same restriction, they produce distinct test statistics and p values in finite samples because the approximation of the restriction's sampling variability is based on the partial derivatives of the parameter estimates. Depending on the exact representation of the null hypothesis, the right part of Table shows that the vector of partial derivatives obtained from these nonlinear functions differ.
As a result, alternative expressions of the common factor hypothesis use different estimators for the nonlinear restriction's sampling variability which yields distinct test statistics and causes them to converge to the asymptotic χ 2 distribution at individual rates. In large samples, this circumstance is unproblematic as the accuracy of the Taylor series approximation increases in sample size while the contribution of the restriction's estimated variability to the test statistic becomes negligible. In finite samples, however, the differences between the alternatives can be substantial (e.g., Gregory and Veall ). At worst, alternative and algebraically identical functional representations of the parameter restriction can indicate opposing conclusions regarding its validity despite the fact that the same model estimates are used to calculate the test statistic.
Importantly, while many statistical so ware packages used to estimate spatial regression models, like Stata or R packages, report results from a Wald test by default, they do not test for common factors. Instead, the null hypothesis these packages evaluate is cross-sectional independence, that is,ρ = 0. Since this is a linear restriction, the Wald statistic's noninvariance problem does not arise and the tests these packages perform are only indicative of the presence of nonrandom spatial clustering. They do not permit any inferences regarding the spatial process at work. Hence, scrutinizing the common factor restriction requires researchers to amend the Wald test's null hypothesis. .

Modifications of the Wald Test
Since the application of the Taylor series expansion results in distinct Wald statistics for algebraically equivalent formulations of the null hypothesis, the asymptotic χ 2 distribution might constitute an inappropriate approximation of the statistic's finite sample distribution for some parameterizations. Therefore, modifications proposed in the literature that attempt to address the Wald test's noninvariance problem primarily focus on adjusting the statistic's reference distribution.
Phillips and Park ( ), for example, show that an Edgeworth expansion of the Wald statistic provides additional information on the statistic's distribution which can be used to obtain corrected critical values and modified test statistics for each functional representation of the null hypothesis (e.g., de Paula and Cribari-Neto ; King and Goh ). Besides these corrections, simulation techniques allow researchers to generate the empirical distribution under the null hypothesis for each specification of the common factor restriction and base inferences on these reference distributions (e.g., Lafontaine and White ; Goh and King ). In particular, bootstrap methods provide a way to estimate critical values and use them as an alternative to the (corrected) asymptotic critical values which can be unreliable in finite samples (Godfrey and Veall ). Using the following procedure, it is straightforward to derive bootstrap critical values to test for common factors in spatial regression models: . Use the estimates from an unrestricted SDM model and calculate the observed Wald statistic w according to Equation ( ) for a given restriction. . Estimate the restricted SEM model. With these estimates and the DGP shown in Equation ( ), generate bootstrap samples by resampling with replacement from the residual vector to obtain bootstrap disturbances. . Repeat step for each bootstrap sample and store the Wald statistic in vectorw . . Sortw in ascending order. The value with rank (1 − α) × 100 + 1 is the estimated bootstrap critical value, χ 2 boot , corresponding to a predefined α-level (e.g., α = 0.05). By comparing w calculated in step to the corresponding bootstrap critical value χ 2 boot from step , its statistical significance can be assessed. This procedure can be repeated for any functional representation of the common factor restriction in order to obtain individual bootstrap critical values for each restriction and a given region of the parameter space. Thereby, researchers can base inferences on the empirical distribution under the null hypothesis instead of relying on the asymptotic χ 2 distribution. This is especially important since the performance of the Wald test not only depends on the specific expression of the common factor hypothesis but also on the particular region in the parameter space. In fact, previous research shows that there is no single formulation of the restriction that consistently outperforms all alternatives (e.g., Gregory and Veall ; Lafontaine and White ; Phillips and Park ). While Goh and King ( ) demonstrate that both asymptotic modifications-the corrected critical values and the improved test statistics-might even deteriorate the Wald test's power and size properties, they conclude that the bootstrap approach constitutes a useful improvement for applied research (see also Lafontaine and White ; Godfrey and Veall ; King and Goh ). Of course, neither the Edgeworth corrections nor simulation techniques completely resolve the noninvariance problem inherent to the Wald test. Doing so requires the application of alternative tests such as the asymptotically equivalent LR test that is invariant to such reparameterizations (e.g., Mur and Angulo ). Yet, by providing corrections for the Wald test's empirical size, these modifications reduce the possibility of intentionally manipulating the result by amending the functional expression of the null hypothesis (King and Goh , ).

Monte Carlo Analysis . Experimental Setup
In order to investigate the finite sample performance of the Wald test, I conduct Monte Carlo experiments in which I vary the sample size, the strength of the interdependence, and the severity of the omitted variables bias through the degree of correlation between the included and the omitted regressor. Using the spatial process depicted in Equation ( ), I generate 1, 000 samples of the outcome vector y for each of the parameter configurations. In the simulations, β = 2 and σ 2 v = 1 are held constant and x is drawn from a standard normal distribution. The parameter space of γ ranges from 0 to 1 in steps of 0.2 while ρ takes on values between 0 and 0.8 in steps of 0.2. This setup includes a nonspatial DGP without omitted variables bias (ρ = 0 and γ = 0), nonspatial DGPs with omitted variables bias (ρ = 0 and γ > 0), SEM DGPs (ρ > 0 and γ = 0), and SDM DGPs (ρ > 0 and γ > 0). W is a row-stochastic contiguity matrix based on the queen criterion of adjacency. In contrast to the rook connectivity scheme which links spatial units ordered on a lattice to their direct horizontal and vertical neighbors, the queen criterion additionally connects the units to their diagonal neighbors (e.g., Cliff and Keith Ord ). The sample sizes specified here contain 49, 100, 225, and 400 observations distributed on regular grids (7 × 7, 10 × 10, 15 × 15, and 20 × 20) to realistically reflect small to medium sized samples frequently encountered in political science.
Since the consequences of model misspecification for the estimation of unbiased effect estimates have been studied elsewhere (e.g., LeSage and Pace ; Pace and LeSage ; Lacombe and LeSage ; Rüttenauer ), this Monte Carlo analysis focuses on the ability of the Wald test to identify the true spatial model and differentiate between substantive and residual dependence across a range of alternative DGPs. To this end, I investigate the performance of the Wald test using the four alternative null hypotheses of common factors summarized in Table . . Performance of the Original Wald Test Table reports the rejection rates of the four expressions of the null hypotheses of common factors at an α-level of 0.05 across the simulations when the true DGP is that of the SEM model (γ = 0). Since there are no omitted spillovers in this scenario, the common factor restriction holds and the four variants of the Wald test are expected to reject the true null hypothesis in about 5% of the simulation trials with a 95% confidence interval of [3.65%; 6.35%].
Although Section analytically shows that the alternative Wald tests are asymptotically equivalent, their type I error rates differ notably in finite samples. Especially H 0 (I I ) but also H 0 (IV ) deviate considerably from the expected error rate. Across all sample sizes, H 0 (I I ) is too conservative when ρ is small. Sinceρ appears in the restriction's denominator (see Table ), the restriction has no derivative at zero which violates the assumed continuity of derivatives. However, the Wald test based on H 0 (I I ) remains valid as its asymptotic distribution is obtained under the null hypothesis which precludes the problematic value (Gregory and Veall ).
Supplementary Material B contains information on the correlation between the regressor x and the omitted variable z for the different values of γ.
Since W is row-stochastic, any value of ρ in the interior of ω −1 mi n and 1 ensures matrix invertibility (LeSage and Pace ; Elhorst b). To perform the simulations, I rely on resources from the High Performance Computing Cluster bwHPC and use the R package spdep (Bivand and Piras ) to estimate all spatial regression models. Replication materials are available on the Political Analysis Dataverse (Juhl a). For an investigation of the substantive effects of spatial misspecification bias in nonspatial OLS and SEM models, see Supplementary Material C. . Supplementary Material C contains additional robustness tests including a scenario with negative spatial autocorrelation (C. ), an alternative specification of the connectivity scheme and an investigation of possible edge effects (C. ). This situation also occurs atβ = 0 for H 0 (I I I ) and atθ = 0 for H 0 (IV ). Based on the χ 2 distribution with df = 1 and α = 0.05, the asymptotic critical value used for all variants of the Wald test and across the different levels of spatial autocorrelation is χ 2 as y m = 3.841. The theoretically expected rejection rate across the , simulation iterations is 5% with a 95% (binomial proportion) confidence interval of [3.65%; 6.35%].
At the same time, incorrectly rejecting a true null hypothesis might be less problematic in this case since the SDM model derives unbiased impact estimates even if only the residuals are spatially clustered and no substantive spillovers exist (e.g., Elhorst ). The only drawback is that the appropriate SEM specification would be more efficient which might affect inferences regarding the statistical significance of a regressor's impact. Consequently, it is crucial for any test of the common factor hypothesis to have satisfactory power properties in order to reduce concerns about biased effect estimates.
Against this background, Figure compares the performance of the alternative variants of the Wald test for different levels of correlation between the spatially dependent omitted variable and the included regressor by reporting their power. As the correlation increases, the tests should be more likely to reject the null hypothesis. In order to account for the effects of the sample size and the strength of the interdependence on the performance of the tests, Figure is comprised of panels. In each panel, the horizontal axis depicts the different values of γ and the vertical axis shows the observed share of rejections across the simulation trials.
A brief inspection of Figure already confirms that alternative parameterizations of the null hypothesis-although algebraically equivalent-yield strikingly different results in finite samples. Even with a decently sized sample, there are pronounced differences in the rejection rates of the four Wald tests. While H 0 (I ), which is considered to be the common way to express the restriction (Gregory and Veall , ), and H 0 (I I I ) perform comparatively well in these simulations, the specifications based on H 0 (I I ) and H 0 (IV ) have inferior power properties. The remarkably low rejection rates of these specifications of the Wald test increase the likelihood that researchers incorrectly infer the absence of meaningful spillover effects.
Moreover, the behavior of H 0 (IV ) differs greatly from the expectation as its rejection rate initially decreases in almost all parameter settings as γ increases. This phenomenon-known as nonmonotonicity in the power function-makes the rejection of the null hypothesis even less likely as the difference between the true DGP and the restriction increases (King and Goh ,  -). In practice, these tests would suggest that the data was generated by a DGP with spatially correlated errors even if there are sizable spillover effects. Researchers would incorrectly conclude that a SEM model or even a nonspatial OLS model appropriately represents the unobservable DGP. Given that these model specifications produce biased impact estimates if a SDM process generated the data, the low rejection rates are highly problematic for substantive inferences.
Although the different variants of the Wald test use the same data and identical parameter estimates, this simulation study shows that, depending on the functional representation of the null hypothesis, they can come to contradictory conclusions regarding the validity of the common factor restriction. In fact, Breusch and Schmidt ( ) analytically show that it is possible to obtain any desired Wald statistic by appropriately specifying the restriction which opens up the possibility to intentionally manipulate the test result (see also King and Goh ). Therefore, any search strategy utilizing the Wald test, like the basic general-to-specific approach or the multistep procedure suggested by Elhorst ( a), is subject to this malfunctioning. Since there is no theoretically justified functional representation of the common factor hypothesis and given the The Wald test based on H 0 (IV ) is also strongly affected by β in the DGP. As Supplementary Material C. shows, the test's performance is even worse for smaller values of β . Supplementary Material C. provides a more detailed discussion on these inconsistencies and identifies regions of the parameter space where the alternative Wald tests diverge most frequently. For the different levels of spatial autocorrelation, the median bootstrap critical values χ 2 boot for each variant of the Wald test at the nominal significance level of 5% are displayed in parentheses. Again, the theoretically expected rejection rate across the simulation trials is 5% [3.65%; 6.35%].
strikingly large share of inconsistent inferences across a range of parameter settings, the evidence presented here strongly caution against the use of the standard Wald test based on an asymptotic reference distribution.

. Performance of the Modified Wald Test Based on Bootstrap Critical Values
While the simulations performed here illustrate that the standard Wald test based on asymptotic critical values is unreliable for the identification of the unobservable spatial process, this section investigates whether the application of simulated reference distributions improves the test's performance. To this end, I use the bootstrap procedure outlined in Section . and compare the observed Wald statistics based on the different formulations of the common factor restriction to their estimated critical values.
The results reported in Table show that the estimated critical values from the bootstrap approach, χ 2 boot , displayed in parentheses not only differ from their asymptotic counterpart χ 2 as y m = 3.841 on which the original Wald test is based. They also reveal sizable discrepancies between the alternative parameterizations of the null hypothesis. While the estimated critical values for H 0 (I ) and H 0 (I I I ) are always higher than χ 2 as y m , the simulated null distributions of H 0 (I I ) suggest much smaller critical values for this expression of the null hypothesis in most scenarios.
Since the functional expression of the nonlinear common factor restriction determines the Wald statistic's rate of convergence to the asymptotic χ 2 distribution, estimating critical values for each alternative parameterization improves the empirical size of the Wald test in finite samples. Compared to the original tests based on the asymptotically derived critical value, the observed rejection rates of each of the four variants of the Wald test is closer to the nominal significance level of 5% across all sample sizes. Whereas the observed rejection rate of H 0 (I ) ranges from 6.9% to 11% across the different values of ρ for a sample size of n = 49 when relying on the asymptotic χ 2 distribution (see Table ), its corresponding range is narrowed to . %-. % when using bootstrap critical values. Similarly, the bootstrap critical values even improve the size of H 0 (IV ) which performed poorly under the asymptotic reference distribution. For n = 49, basing inferences on the simulated null distribution narrows the range of rejection rates from . %-. % to . %-. % across the different levels of spatial autocorrelation.
In conclusion, the Monte Carlo evidence presented here demonstrate that using the simulated null distribution as a reference distribution and basing inferences on estimated rather than asymptotically derived critical values ameliorates the problems posed by the Wald test's lack of invariance to alternative parameterizations of the common factor hypothesis. Since the bootstrap critical values account for differences in the convergence rates of the Wald statistics, this modification constitutes a superior alternative that facilitates the empirical assessment of the common factor restriction in spatial regression models. Alternatively, the LR test constitutes another option that is invariant to such reparameterizations (Godfrey and Veall ). Hence, irrespective of the empirical model search strategy employed, researchers should utilize the modified Wald test based on the simulated null distribution or the LR test in order to empirically evaluate the appropriateness of the spatial model employed.

Empirical Example: Spatial Contagion Effects in Economic Voting
An empirical example helps to demonstrate the consequences of the problem for applied research aiming to evaluate the empirical evidence for a theorized mechanism while ruling out alternative mechanisms. To this end, I reanalyze a study conducted by Williams and Whitten ( ) that investigates spatial contagion effects, understood as the process by which "[. . .] a policy success or failure of one political party in the eyes of voters similarly affects those parties that are ideologically proximate" (Williams and Whitten , ). The utilization of different sample sizes and the availability of a plausible alternative mechanism make this study an ideal case to investigate the consequences of the Wald test's lack of invariance to reformulations of the common factor hypothesis.
Williams and Whitten ( ) argue that the electorate not only rewards or punishes the parties forming the current government for the country's economic performance at the ballot box as predicted by the economic voting hypothesis. Since voters group parties based on their ideological stances, the effect of economic prosperity also spills over to ideologically proximate parties irrespective of whether or not these parties also belong to the government. These indirect effects conjectured by the authors link the economic wellbeing of a country to the electoral performance of opposition parties. Therefore, the study contributes to the literature on electoral competition by combining insights from the hitherto separated literatures on economic voting and spatial party competition.
To assess the empirical support for the proposed mechanism, Williams and Whitten ( , -) analyze data on electoral contests in parliamentary democracies from to , where the parties constitute the unit of analysis. The change in a party's vote share between The application of bootstrap critical values can also improve the Wald test's power as Supplementary Material C. shows. Supplementary Material C. verifies the good performance of the LR test in the Monte Carlo simulations conducted here. two consecutive elections is the dependent variable and the country's economic performance, measured by the real GDP per capita growth, is the main regressor of interest. In their study, the authors emphasize the importance of spatial regression models which facilitate the estimation of the theoretically expected contagion effects in the form of spatial spillovers. They choose the SAR model specification in order to quantify (global) spillover effects (Williams and Whitten , -). In line with the economic voting literature suggesting that the voters' ability to clearly attribute the responsibility for the economic (mis)fortune is a necessary precondition for economic voting to occur, they estimate separate SAR models for elections with high (n = 398) and low levels of clarity (n = 1, 030). While economic voting itself should be less pronounced in elections characterized by a low clarity of responsibility because it is harder for voters to hold a party accountable for the country's economic performance, the authors expect to find larger spatial contagion effects in this context. The argument proposed in the study is that in low clarity settings, the electorate is more experienced in switching their support from one party to an ideologically similar party. Therefore, the voters' sophistication in terms of reallocating their support creates stronger interdependencies between parties in low clarity elections as compared to high clarity settings where voters can easily identify the party who is responsible for the national state of the economy (Williams and Whitten , -). Given these theoretical expectations, the SAR model constitutes an appropriate choice as it links the electoral fortune of a party to the performance of the other parties and allows to distinguish between direct and indirect effects of economic prosperity on the parties' vote shares. Yet, while unfocused diagnostics, like Moran's I, indicate the existence of spatial interdependencies, it is possible that an alternative spatial process caused the clustering detected in the data. Unmodeled election specific particularities, for example, that are unrelated to a country's economic performance-like the general appeal of a candidate or political scandals-might affect the election outcome of ideologically proximate parties as well. Since these factors are not part of the regression's systematic part, they potentially cause spatial clustering in the residuals. Instead of substantively meaningful contagion effects, this plausible alternative process implies no indirect effect of economic performance but a mere diffusion of shocks which would be adequately captured by the SEM model. Consequently, there is a risk that the SAR models specified by the authors lead to incorrect inferences regarding the existence of contagion effects.
To demonstrate the substantive differences between the two alternative spatial processes, Figure displays the estimated direct and indirect effects of the main regressor of interesteconomic performance-on the vote share of opposition parties derived from the SAR, SEM, and SDM model. As the theory suggests, spatial contagion effects should mitigate the negative effect of a strong economy for opposition parties. This is because the beneficial effect of positive economic conditions for governing parties spills over to ideologically neighboring opposition parties. In contrast, if merely the errors are spatially correlated, no contagion takes place and only a direct negative effect of a country's economic performance exists for opposition parties.
Despite a significant spatial parameter estimate, Figure illustrates that the alternative spatial models suggest no indirect spillover effect of economic growth in low clarity elections. In high clarity elections, only the SAR model identifies significant spillover effects. While the SEM model assumes no spillovers, the average indirect impact of economic growth on the change in vote share Williams and Whitten ( ) present a more detailed discussion on the dataset as well as a comprehensive derivation of their theoretical expectations regarding spatial contagion effects. Table in the study by Williams and Whitten ( ), ) reports the Moran's I and Geary's C tests of spatial interdependencies which both indicate spatial autocorrelation. Note that, while this table also reports results of a Wald test, this is not the Wald test of common factors but rather a test of the null hypothesis of no spatial dependence (see Section . ). While Table in the original work only contains the prespatial marginal effects, the estimates presented here explicitly disentangle direct and indirect (or contagion) effects. This highlights the necessity to base substantive inferences on impact rather than coefficient estimates. Elhorst ( ) and LeSage and Pace ( ) provide a more detailed discussion on this important issue. for opposition parties as estimated by the SAR model is 0.020 with a simulated 95% confidence interval within [0.004; 0.045]. In contrast, the estimate derived from the SDM specification is 0.025 [−0.208; 0.244], suggesting no significant spillovers. Besides the SDM model's remarkable efficiency loss, this example illustrates that the identification of the theorized contagion effects is contingent on the specification of the underlying spatial process. Although the SAR and SEM models produce similar and statistically indistinguishable total impact estimates of economic performance in high clarity elections, the results have very different theoretical implications.
Notably, the overall impact of economic growth on an opposition party's vote share in the SEM model solely consists of the direct impact of x i on y i . In contrast, the SAR model also identifies significant indirect impacts. Therefore, while the SAR model supports the theory of spatial contagion effects, there are no substantive spillover effects in the SEM model which highlights the importance of adequately distinguishing between these alternative processes for substantive inferences.
In order to address the problem of model misspecification and to empirically distinguish the two plausible spatial processes, I implement the Wald test of common factors by using the SDM model estimates and the four specifications of the common factor restriction outlined in Table . If the data supports the theory of indirect contagion effects, the tests should reject the null hypothesis. Yet, Table illustrates that in both high and low clarity contexts, the alternative Wald tests not only differ in their test statistics. Based on the asymptotic critical values, they also come to substantively different conclusions regarding the existence of the spillover effects. While H 0 (IV ) supports the theory proposed by Williams and Whitten ( ), the other three alternative versions of the Wald test fail to reject the common factor hypothesis. Instead of substantively meaningful spillovers, these tests only indicate residual dependence which implies that no spatial contagion takes place among the parties. Given the rather large number of observations in the low clarity scenario, these differences become even more alarming. Alternatively, when inferences regarding the underlying spatial process are based on the simulated null distribution of each parameterization of the common factor restriction, all four variants of the Wald test fail to reject the null hypothesis at conventional significance levels.
Taken together, this empirical case study confirms the Monte Carlo evidence by demonstrating that relying on bootstrap critical values in order to identify statistically significant deviations from In order to appropriately account for sampling uncertainty, I use the point estimates and the variance-covariance matrices obtained from the different spatial models to set up multivariate normal distributions from which I sample 1, 000 sets of coefficients.  and on bootstrap critical values χ 2 boot respectively. While each restriction has an individual simulated critical value, they share a single asymptotic critical value which depends on the α-level and the number of degrees of freedom.
the Wald statistic's null distribution improves its finite sample performance and alleviates the conflict between alternative parameterizations of the nonlinear common factor hypothesis. While the tests based on the asymptotic χ 2 distribution come to contradictory conclusions even with a sample size of more than 1, 000 observations, the bootstrap procedure is able to correct for this undesirable circumstance. Regarding the theorized mechanism, this analysis finds insufficient evidence to convincingly dispel doubts that, instead of the theorized spatial contagion effects, correlation in the residuals caused the spatial clustering found in the data.

Conclusion
Distinguishing substantively meaningful indirect spillover effects from a mere diffusion of random shocks is essential as there is a serious risk of making incorrect inferences when estimating a misspecified model. Yet, the task of appropriately modeling the process underlying observable patterns of interrelatedness between the units poses notable difficulties for political scientists. Although many empirical specification search procedures rely on the Wald test to assess the nonlinear common factor restriction, the test's lack of invariance to algebraically equivalent formulations of the null hypothesis poses a serious problem for the accuracy of inferences.
This study investigates the consequences of the Wald test's sensitivity to alternative and algebraically equivalent expressions of the common factor hypothesis for its ability to guide the empirical model specification search. By presenting analytical evidence and using Monte Carlo simulations as well as an empirical example, it shows that the necessity to approximate the sampling variability of a nonlinear function by a Taylor series expansion causes the Wald test's sensitivity to algebraically equivalent reparameterizations of the null hypothesis. While asymptotically valid, this approximation produces considerable differences in finite samples, depending on the restriction's functional representation. In many instances, alternative null hypotheses even suggest contradictory conclusions regarding the underlying spatial process since they converge to the Wald statistic's asymptotic χ 2 distribution at different rates. Given that there is no theoretical justification for any particular expression and since their performance is contingent on the relevant region of the parameter space, the results caution against relying on the Wald test's asymptotic results in any specification search strategy. Instead, practitioners should either base inferences on a simulated null distribution by estimating bootstrap critical values or turn to the LR test which is invariant to such reparameterizations in order to avoid spurious inferences. Subsequent research might continue this line of research by developing more reliable strategies that help practitioners to differentiate between substantive and residual dependence. As Mur and Angulo ( ) show, the evidence in favor of any search strategy proposed in the literature is mixed which explains the debate about the most appropriate strategy and prevents the development of general guidelines for the empirical identification of the correct model specification (Rüttenauer , ). In this regard, Lacombe and LeSage ( ), for example, demonstrate that Bayesian methods constitute a promising alternative to the frequentist null hypothesis significance testing approach. Additionally, multimodel inference might help overcoming the current fixation on model selection and instead allows researchers to focus on the identification of substantively meaningful spillover effects in the data (see also Juhl b). Especially regarding the considerable difficulties researchers face when attempting to empirically distinguish between different spatial processes (e.g., Gibbons and Overman ), following this line of investigation will enhance model building and contribute to our understanding of different interaction effects among the units of analysis.
Spatial autocorrelation poses notable challenges for the correct specification and interpretation of statistical models as model misspecification can bias the substantive inferences. Notwithstanding these difficulties, interdependencies are paramount in social science theories which obliges researchers to carefully consider the process generating these dependencies when building empirical models in order to make valid inferences with respect to the theories. Consequently, especially in the absence of design-based identification strategies as proposed by Gibbons and Overman ( ), methodological research facilitating the appropriate specification of spatial models constitutes an important contribution for a thorough assessment of theoretical expectations.

Funding
This research was funded by the German Research Foundation (DFG)-Project-ID -SFB . I also gratefully acknowledge support by the state of Baden-Württemberg through the High Performance Computing Cluster bwHPC (INST / -FUGG) and the University of Mannheim's Graduate School of Economic and Social Sciences.