Online Supplement to “Continuously Updated Indirect Inference in Heteroskedastic Spatial Models ”

Spatial units typically vary over many of their characteristics, introducing potential unobserved heterogeneity which invalidates commonly used homoskedasticity conditions. In the presence of unobserved heteroskedasticity, methods based on the quasi-likelihood function generally produce inconsistent estimates of both the spatial parameter and the coefficients of the exogenous regressors. A robust generalized method of moments estimator as well as a modified likelihood method have been proposed in the literature to address this issue. The present paper constructs an alternative indirect inference (II) approach which relies on a simple ordinary least squares procedure as its starting point. Heteroskedasticity is accommodated by utilizing a new version of continuous updating that is applied within the II procedure to take account of the parameterization of the variance–covariance matrix of the disturbances. Finite-sample performance of the new estimator is assessed in a Monte Carlo study. The approach is implemented in an empirical application to house price data in the Boston area, where it is found that spatial effects in house price determination are much more significant under robustification to heterogeneity in the equation errors.


S.1 Derivation of bias expressions for MLE/QMLE
In this section we report the derivation of the bias function displayed in Figure 1 of the manuscript.
Corollary S1 Let be a vector of n independent random variables, normally distributed and such that E( ) = Ω 0 (γ), where Ω 0 (γ) is defined in (2.7) in the manuscript with σ 2 = 1.Let Assumptions 2-4, reported in the manuscript, hold.The leading term of B(γ, λ 0 ) is given by Under Ω 0 (γ) in (2.7), terms in (S.1.1),(S.1.3)and (S.1.5)do not vanish as n increases, unless γ = 0 (i.e. the homoskedastic case) and/or some specific structure of W is imposed which ensures that a condition related to (2.8) in the manuscript holds.Given the likelihood function (2.3) in the manuscript, the calculation of (S.1.1)-(S.1.4)is based on the explicit computation of moments of ratio of quadratic form.Most of the moments of ratios involved are indeed exactly ratio of moments, as ratios of the form A / M X for a generic n × n matrix A are independent of M X1 .However, since we are only interested in the leading terms of (S.1.6),we can approximate moments of ratios as ratios of moments even when the independence conditions fails.The computation of moments is standard (Bao and Ullah (2007)) and details are omitted here.

S.2 Proofs of the Theorems
Proof of Theorem 1: Proof of part (i).Let ψ ij and ψij be the 2 × 1 vectors defined as After showing as reported in the manuscript, the rest of the proof is similar to KPR (2017).In order to avoid repetition we refer to their proof when steps follow in a similar way. Define , according to (S.2.1).The {u i , 1 ≤ i ≤ n, n = 1, 2, .....} form a triangular array of martingale differences with respect to the filtration formed by the σ-field generated by { j ; j < i}.Let where η is a 2 × 1 vector satisfying η η = 1.By Theorem 2 of Scott (1973) n i z in → d N (0, 1) if the following stability and Lindeberg conditions hold: where where C 1 and C 2 contain the first and second terms in (S.2.7), respectively.All terms in C 1 are O(1), while those in C 2 are bounded by O(1/h) under Assumptions 3 and 4, and by standard algebra.
Existence of limits in (S.2.7) is guaranteed under Assumption 7, and non singularity of C 1 is ensured by Assumptions 2, 3(ii) and 5. Thus, we can replace A by n when showing (S.2.4) and (S.2.5).
We start by establishing (S.2.4), which can equivalently be written as The latter, by standard manipulations and (S.2.6), is equivalent to showing as n → ∞.
In order to avoid replications, we omit the proof of (S.2.9), referring to KPR and observing that for each s, v = 1, 2. Under Assumption 5, i.e. for uniformly bounded X ij for i, j = 1, ...., n, the LHS of (S.2.12) has mean zero and variance bounded by since (S.2.11) holds and (S.2.16) Convergence to zero of the first term at the RHS of (S.2.16) can be shown as in KPR.Convergence of the third term at the RHS of (S.2.16) can be shown after observing that where β 0 X j is uniformly bounded under Assumption 5. Thus, the second term at the RHS of (S.2.16) is bounded by similarly to KPR, under Assumptions 3-5.
, and the statement in Theorem 1(i) follows by standard delta arguments.
Proof of part (ii).Again, we proceed similarly to KPR and we refer to their proof to avoid repetitions.We rewrite the binding function τ n (λ) as where We write where and Ω (1) we can derive the limit distribution of √ n( λCUII − λ 0 ) by the delta method, as long as the asymptotic local relative equicontinuity condition (Phillips, 2012) holds.Thus, similar to KPR, we need to show Under Assumption 6(ii), the expression on the LHS of (S.2.25) is bounded by which by the mean value theorem is in turn bounded by where λ * is an intermediate point between λ 0 and r.The expression in (S.2.27 which holds under Assumptions 3-5, a derivation of which will be supplied on request.
Therefore, by a delta argument we conclude that where V n and fn are defined in (4.4) and (4.11), respectively.The statement in Theorem 1 follows by standard algebra once we write in terms of ā(1) , b(1) , c(1) and d(1) .τ exists and is non singular under Assumption 7(ii).

Proof of Theorem 2:
In order to prove (A.8) in the manuscript, we need to show We start by (S.2.31).We have, for s, t = 1, 2 The first term at the RHS of (S.2.34) has mean zero and variance bounded by The second term at the RHS of (S.2.34) has mean zero and variance bounded by Similarly, we can show that the third term at the RHS of (S.2.34) converges to zero in quadratic mean.
In order to show (S.2.32) we write where Q i is the 1 × n vector displaying the i−th row of Q and B ij = X i (X X) −1 X j , as defined at the beginning of the proof of Theorem 1.By standard arguments, we can show that the last two terms on the RHS of (S.2.37) are bounded in probability by 1/ √ n, Thus, (S.2.32) is equivalent to as n → ∞.We therefore need to show, as n → ∞, that We only consider the leading term in vi in (S.2.38) when showing (S.2.40)-(S.2.48), but similar routine arguments can be applied to deal with higher order terms.
The modulus of the LHS of (S.2.40) has expectation bounded by Similarly, the modulus of the LHS of (S.2.41) has expectation bounded by The modulus of the LHS of (S.2.42) has expectation bounded by (S.2.47) (S.2.43) can be shown by similar arguments as (S.2.40)-(S.2.42), while (S.2.48) can be written as The modulus of the first term in the last displayed expression has expectation bounded by as in previous calculations.Similarly, the second term in (S.2.48) is O(1/nh), while the third term has mean zero and variance bounded by Proceeding as before, the first term in the last displayed expression is bounded by O(1/n 2 h 2 ), while the second one is bounded by O(1/nh 2 ).By Markov's inequality, this conclude the proof of (S.2.32).
In order to show (S.2.33) we apply a standard mean value theorem argument, such as By similar arguments to those applied to prove (S.2.31) and (S.2.32), we conclude that as n → ∞  resulting in sparsity that amounts to about 37%.n = 506.

Figure S5 :
Figure S5: 3D plot of weight matrix W tax .W tax is defined such that w ij = 1/|tax i − tax j |, resulting in a non-sparse structure with weights that decay with an economic distance driven by tax similarity.n = 506.

Figure S6 :Figure S7 :
Figure S6: 3D plot of weight matrix W school .W school is defined such that w ij = 1/|school i −school j |,resulting in a non-sparse structure with weights that decay with an economic distance driven by socioeconomic similarity.n = 506.

Table S1 :
Bias & MSE of CUII, ML, MQML, 2SLS and RGMM estimators for 'random' W .The i s are defined as in (7.1) with ζ i ∼ iid t(5) and σ i defined as in (7.2).The design corresponds to an artificially dense choice of W .

Table S3 :
Bias & MSE of CUII, ML, MQML, 2SLS, RGMM and CUGMM estimators for 'exponential' W using 1000 Monte Carlo replications.The i s are defined as in (7.1) with ζ i ∼ iid t(5) and σ i is defined as in (7.2).The design corresponds to a strong relevance of instruments.

Table S5 :
Bias & MSE of CUII, ML, MQML and RGMM estimators for 'exponential' W using 1000 Monte Carlo replications.The i s are defined as in (7.1) with ζ i ∼ iid t(5) and σ i is defined as in

Table S6 :
Bias & MSE of CUII, ML, MQML and RGMM estimators for 'exponential' W using 1000 Monte Carlo replications.The i s are defined as in (7.1) with ζ i ∼ iid t(5) and σ i ∼ χ 2 (5).The design corresponds a misspecification setting where the true data generating process is a pure SAR, while the fitted model includes an intercept and one exogenous regressor drawn from a uniform distribution on [0, 1].FigureS2: 3D plot of W geo .W geo is defined such that w ij = 1/geo ij , resulting in a non-sparse structure with weights that decay with Euclidean/geographical distance.n = 506.