SIMULTANEOUS EQUATIONS MODELS WITH HIGHER-ORDER SPATIAL OR SOCIAL NETWORK INTERACTIONS

This paper develops an estimation methodology for network data generated from a system of simultaneous equations, which allows for network interdependencies via spatial lags in the endogenous and exogenous variables, as well as in the disturbances. By allowing for higher-order spatial lags, our specification provides important flexibility in modeling network interactions. The estimation methodology builds, among others, on the two-step generalized method of moments estimation approach introduced in Kelejian and Prucha (1998, Journal of Real Estate Finance and Economics 17, 99–121; 1999, International Economic Review 40, 509–533; 2004, Journal of Econometrics 118, 27–50). The paper considers limited and full information estimators, and one- and two-step estimators, and establishes their asymptotic properties. In contrast to some of the earlier two-step estimation literature, our asymptotic results facilitate joint tests for the absence of all forms of network spillovers.


INTRODUCTION
In this paper, we develop a generalized estimation theory for simultaneous equation systems for cross-sectional data with possible network interactions in the dependent variables, the exogenous variables and the disturbances.A leading application will be spatial networks.However, since network interdependencies are modeled only to relate to a measure of proximity, without assuming that observations are indexed by location, the developed methodology can be of interest to the estimation of a much wider class of networks, including social networks.
There is substantial empirical evidence of cross-sectional interdependence among observations in many areas of economics both at the macro level, where cross-sectional units may, e.g., be countries, states, or counties, as well as at the micro level where cross sectional units may, e.g., be industries, firms, or individuals. 1 An important class of models for spatial networks originates from the seminal work by Whittle (1954) and Cliff andOrd (1973, 1981).In those models, crosssectional interactions are modeled through spatial lags, where the weights used in forming the spatial lags are reflective of the relative importance of the links between neighbors for the generation of spillovers.In a spatial setting, the relative importance would typically be taken to be inversely related to a measure of distance.The usefulness of those models for the analysis of a wide class of networks beyond spatial networks stems from the recognition that the notion of distance is not confined to geographic distance. 2 Spatial econometrics has a long history in geography, regional science, and urban economics; see, e.g., Anselin (1988).For the last two decades, the development of econometric methods of inference for Cliff-Ord type models has also been an active area of research in economics. 3Most of the literature focused on single-equation models where a single dependent variable, say, y i , is determined for units i = 1,...,n.However, in economics, it is frequent that the outcomes for several dependent variables, say, y i1 , . . .,y iG , are determined jointly by a system of equations for units i = 1, . . .,n.In this case, the simultaneous nature of the outcomes can stem from two sources, interactions between different economic variables as well as interactions between cross-sectional units.
Surprisingly, the literature on the estimation of simultaneous systems of spatially interrelated cross-sectional equations has been quite limited until recently.Kelejian and Prucha (2004) provide, by extending the methodology developed in Kelejian andPrucha (1998, 1999) for single equations, an early development of generalized method of moments (GMM) estimators for such models.However, as 1 For instance, Conley and Ligon (2002) or Ertur and Koch (2007) document spatial spillovers in economic growth.Holtz-Eakin (1994) or Audretsch and Feldmann (1996) put forward evidence for spatial spillovers in productivity.Egger, Pfaffermayr, and Winner (2005) provide evidence for spatial interdependencies of value added tax rates, and Devereux, Lockwood, and Redoano (2008) for spatial corporate tax rates.Case, Hines Jr., and Rosen (1993) report on spatial budget spillovers.The results in Behrens, Koch, and Ertur (2012) suggest that bilateral trade flows exhibit spatial interdependence, and Baltagi, Egger, and Pfaffermayr (2007), Baltagi, Egger, andPfaffermayr, 2008 andBlonigen et al. (2007) illustrate that the same holds true for bilateral foreign direct investment.Pinkse, Slade, and Brett (2002) provide evidence for spatial price competition among wholesale-gasoline terminals.For contributions to the literature on social interactions see, e.g., Ballester, Calvó-Armengol, and Zenou (2006); Lee (2007); Calvó-Armengol, Patacchini, and Zenou (2009); Blume et al. (2011); Cohen-Cole, Liu, and Zenou (2018) and Liu (2014). 2 As stated, we think Cliff-Ord type models provide important tools for analyzing networks.However, we also want to point out an important literature in statistics where spatial dependence is molded via random Markov fields and conditional autoregressive models; see, e.g., Cressie (1993). 3See, e.g., Anselin (2010) for a review of the development of spatial econometric methods.discussed in more detail below, they do not provide a full asymptotic theory for all considered estimators and their setup only covers first-order spatial lags.Liu (2014Liu ( , 2019Liu ( , 2020)); Cohen-Cole, Liu, and Zenou (2018), and Liu and Saraiva (2019) build and extend the methodology of Kelejian and Prucha (2004) within the context of social interaction models with first-order spatial lags, and cross-sectionally independent disturbances.Their contributions include one-step GMM estimation methods that utilize both linear and quadratic moment conditions, identification conditions, bias correction procedures for many instruments, heteroskedasticity, and an estimation methodology for a simultaneous system of equations with binary outcomes generated from an incomplete information network game.Other recent contributions to the literature on spatial simultaneous equation models are Baltagi and Deng (2015), who consider an extension of a two-equation system with first-order spatial lags to panels.Wang, Li, and Wang (2014) analyze the quasi maximum likelihood estimator for such a system in the cross section.Yang and Lee (2017) consider identification and quasi maximum likelihood estimation for a multiequation system with a first-order spatial lag in the dependent variable.Yang and Lee (2019) provide an extension to dynamic panel data models allowing for multiple weights matrices.In contrast to the current paper, the above-cited literature only considers first-order spatial lags in the dependent variable, and does not also consider spatial spillovers in the disturbance process.Those papers also differ in terms of the considered estimation methodology.In particular, a methodological focus of this paper is on two-step estimation, while also covering one-step estimation.
Within the context of Cliff-Ord type models, two-step procedures are generally less efficient than one-step procedures.However, they are attractive, especially for situations where the instruments are strong and the efficiency loss is small, because of their relative computational simplicity.An important limitation of the two-step estimation methodology developed in Kelejian and Prucha (2004) is that the paper only establishes the consistency of the estimator for the spatial autoregressive parameter in the disturbance process, but not its asymptotic normality.As a result, the methodology does not facilitate a joint test for the absence of spatial interactions in the dependent variables, the exogenous variables and the disturbances.Closely related to this is that Kelejian and Prucha (2004) do not consider efficiently weighted GMM estimators for the spatial autoregressive parameters, since that paper lacked the knowledge of the limiting distribution for those estimators.
Another important limitation of the earlier paper is that it only allowed for first-order spatial lags.Allowing for higher-order spatial lags is important for at least two reasons.First, the researcher may not be sure about the channel through which interactions occur-e.g., although geographic proximity, or technological proximity, or both.By allowing for higher-order spatial lags the researcher can consult the data on this issue.Second, as argued below, higher-order spatial lags can be used to partially relax the requirement regarding a priori knowledge of what weights should be assigned to different units in the construction of a spatial lag. 4iven the complexity of our systems specification, we do not pursue nonparametric estimation here.
The paper is organized as follows: Section 2 specifies the considered simultaneous equation system with spatial/cross sectional network interactions.In this section, we also discuss two exemplary applications.The first example highlights how higher-order spatial lags can be used to achieve a more flexible specification of the spatial weights.The second example considers a social interaction model where individuals make interdependent choices on the level of effort for multiple activities.In Section 3, we discuss the moment conditions underlying the considered GMM estimators for the regression parameters and the parameters of the disturbance process.The paper focuses on two-step estimation procedures.It turns out that the distribution of the GMM estimator for the spatial autoregressive parameters of the disturbance process depends on the estimator of regression parameters.Section 4 is hence devoted to give generic results concerning the consistency and asymptotic normality of two-step GMM estimators.In particular, we give generic results concerning the joint limiting distribution of estimators for all model parameters of interest, which can be utilized in the usual way to form general Wald tests regarding the model parameters.Results for one-step estimators are in essence delivered as a special case of two-step estimation.In Section 5, we consider specific limited and full information two-step estimators, and provide specific expressions for consistent estimators of the associated asymptotic variance-covariance (VC) matrices of those estimators.In Section 6, we consider limited and full information one-step estimators that combine the linear and quadratic moment conditions used by the two-step estimators.The last section concludes with a summary of our findings and possible directions for future research.All technical derivations are given in Appendixes and in an Online Supplementary Appendix.In the Online Supplementary Appendix, we also report on a Monte Carlo study of the small sample properties of the various estimators and test statistics.
Throughout the paper, we adopt the following notations and conventions.Let (A n ) n∈N be some sequence of matrices, then we denote the (i,j)th element of A n with a ij,n .If A n is nonsingular, then we denote its inverse with A −1 n , and the (i,j)th element of A −1 n with a ij n .Let A n be of dimension p n ×p n , then the maximum column sum and row sum matrix norms of A n are, respectively, defined as If A n 1 ≤ c and A n ∞ ≤ c for some finite constant c which does not depend on n, then we say that the row and column sums of the sequence of matrices A n are uniformly bounded in absolute value.We note that if the row and column sums of the matrices A n and B n are uniformly bounded in absolute value, then so are the row and column sums of A n + B n and A n B n ; cf., e.g., Kapoor, Kelejian, and Prucha (2007), Remark A2.For any square matrix A n , A n = (A n + A n )/2, and for any vector or matrix , where tr denotes the trace operator.
Let A n,g , g = 1,... G, be a sequence of matrices, then diag G g=1 (A n,g ) denotes the block diagonal matrix, where A n,g is the gth diagonal block.

MODEL
In the following, we specify our simultaneous system of G equations for G endogenous variables observed for n cross-sectional units.

Structural Form Model
In specifying the system, we allow for two sources of simultaneity.First, the observations for the gth endogenous variable for the ith unit may depend on observations of the other endogenous variables for the ith unit, as in the classical text book simultaneous equation system. 5Second, simultaneity may stem from Cliff andOrd (1973, 1981) type higher-order cross-sectional network interactions, where spatial interactions represent a leading application.
As remarked, the model specification will be fairly general and allows for network interactions modeled by, possibly, higher-order spatial lags in the dependent variables, the exogenous variables and the disturbances.More specifically, let g denote the equation index, then we assume that the cross-sectional data are generated by the following system (g = 1, . . .,G): where y g,n is the n × 1 vector of cross-sectional observations on the dependent variable in the gth equation, x k,n is the n × 1 vector of cross-sectional observations on the k th exogenous variable, which is taken to be nonstochastic, 6 u g,n is the n×1 disturbance vector in the gth equation, W s,n and M r,n are n × n weights matrices, ε g,n is the n × 1 vector of innovations entering the disturbance process for the gth equation, and n denotes the sample size.With b lg,n and c kg,n we denote the (scalar) parameters corresponding to the lth endogenous and kth exogenous variables, respectively.Of course, the structural model parameters are not identified without certain restrictions.Those restrictions will be introduced below.Consistent with the usual terminology for Cliff-Ord type network interactions, we refer to W s,n and M r,n as spatial weights matrices, to y l,s,n = W s,n y l,n and u g,r,n = M r,n u g,n , as spatial lags, and to the (real scalar) parameters λ lg,s,n and ρ g,r,n as spatial autoregressive parameters.The weights matrices carry the information on the links between units and on the relative weight of those links, and the spatial autoregressive parameters describe the strength of the spillovers.Although originally introduced for spatial networks, Cliff-Ord type interaction models do not require the indexing of observations by location.In general, they only rely on a measure of distance in the formation of the spatial weights.Since the notions of space and distance or proximity are not confined to geographic space, these models have, as discussed Section 1, also been applied in various other settings.This includes social-interaction models, where one considered specification has been to assign to each of the ith individual's friends a weight of 1/n i , where n i denotes the total number of friends of i, while assigning zero weights to individuals not belonging to the circle of friends.In the following, we continue to refer to y l,s,n and u g,r,n and λ lg,s,n and ρ g,r,n as spatial lags and spatial autoregressive parameters, but note the wider applicability.
The reason for allowing the elements of the spatial weights matrices to depend on the sample size is to permit-as is frequent practice in applicationsnormalizations of these matrices where the normalization factor(s) depend on the sample size. 7The ith element of y l,s,n is given by y il,s,n = n j=1 w ij,s,n y jl,n .We note that even if the elements of the spatial weights matrices do not depend on the sample size, the elements of the spatial lag y l,s,n and, analogously, the elements of u g,r,n will generally depend on the sample size.This in turn implies that also the elements of y g,n and u g,n will generally depend on the sample size, i.e., form triangular arrays.In allowing the elements of x k,n to depend on the sample size, we implicitly also allow for some of the exogenous variables to be spatial lags of exogenous variables.For example, the elements of x k,n could be of the form x ik,n = n j=1 w ij,s,n ξ j where ξ j is some basic exogenous variable.Thus the model accommodates, as remarked above, cross-sectional interactions in the endogenous variables, the exogenous variables, and the disturbances.
The above model generalizes the spatial simultaneous equation model considered in Kelejian and Prucha (2004) in allowing for higher-order spatial lags. 8onsistent with the terminology introduced by Anselin and Florax (1995) in a single-equation context, we refer to the above model as a simultaneous spatial autoregressive model of order p with spatially autoregressive disturbances of order q, for short, a simultaneous SARAR(p,q) model. 9One reason for allowing for multiple spatial weights matrices is that they can capture different forms of proximity between units.For example, within the context of R&D spillovers between firms, one matrix may refer to geographic proximity between firms, and the other may correspond to a measure of proximity in the product space.As another example, as discussed in more detail below, within the context of a social interaction model different matrices may refer to different circles of friends, e.g., one matrix may identify the very close friends, and a second matrix the other friends.Additionally, as discussed below, an estimation theory that allows for multiple spatial weights matrices can also be used to accommodate certain parameterizations of the spatial weights.
Model (1) can be written more compactly as with and where the parameter matrices B n = (b lg,n ) G×G , C n = (c lg,n ) K×G , n = (λ lg,s,n ) pG×G , and R n = (ρ g,r,n ) qG×G are defined conformably. 10

Exemplary Applications
We next motivate the importance of considering higher-order spatial lags with two examples.The first example illustrates how higher-order spatial lags can be useful for certain parameterizations of the spatial weights.The second example formulates a social interaction model where the utility maximizing solution is described by a system of equations with higher-order spillovers as defined in (1).

Parameterized Spatial Weights.
As part of the specification of the model in (1) the researcher has to specify the elements of the spatial weights matrices.When those elements are specified incorrectly, the model is misspecified and the estimates will generally be inconsistent.Allowing for higher-order spatial lags provides important flexibility and robustness in modeling network 9 For single equations higher order SAR models have been considered by Blommestein (1983Blommestein ( , 1985) ) and Huang (1984), among others, and more recently by Bell and Bockstael (2000); Cohen andMorrison Paul (2007), Badinger andEgger (2010), and Lee and Liu (2010). 10For clarity, we note that the gth column of n and R n are, respectively, given by [λ 1g,1,n ,...,λ 1g,p,n ,...,λ Gg,1,n ,...,λ Gg,p,n ] and [0,...,0,ρ g,1,n ,...,ρ g,q,n ,0,...,0] .interactions.However, while adding higher-order spatial lags helps to address potential specification errors, it does not, of course, eliminate the possibility; for a recent contribution on an omnibus test for weights matrix misspecification see Lee, Phillips, and Rossi (2021).
In the following, we discuss exemplarily how higher-order spatial lags can be used to allow for certain flexible parameterizations of the spatial weights.For simplicity of notation we drop subscripts n.Spatial weights are often specified as a function of some distance measure, possibly combined with some contiguity measure.Let W = (w ij ) be the basic spatial weights matrix, let d ij denote some distance measure between units i and j, and let d * ij be some contiguity measure taking on values of one or zero.Then, the researcher may specify the weights as the product of the contiguity measure and a polynomial in d ij , treating the coefficients of the polynomial as unknown parameters11 : Now, suppose the researcher models y g as a function of, say, λ lg Wy l , then clearly with λ lg,s = λ lg λ s , W s = (w ij,s ), and w ij,s = d * ij d s ij .In allowing for higher-order spatial lags, model (1) covers this specification as a special case.Of course, the above specification of spatial weights is entirely illustrative, and model (1) will cover many other specifications, including specifications with alternate basis functions instead of power functions, and more general measures of distance and contiguity.The same ideas also apply to the modeling of the disturbance process.122.2.2.Social Interactions in Multiple Activities.The following example extends Cohen-Cole, Liu, and Zenou (2018).We follow their basic setup, but allow for more flexible peer effects.More specifically, consider a model where n individuals choose effort levels for G activities, say y i1 ,...,y iG , allowing for peer effects among p groups of peers, e.g., for p = 2, we may distinguish between very close friends and other friends.Now let w * ij,s be one or zero depending on whether individual j belongs to the sth peer group of individual i, and let n is be the size of that peer group.(3) The specification considered in Cohen-Cole, Liu, and Zenou (2018) corresponds to p = 1.The first-order conditions for the maximum of u(y i1 ,...,y iG ) yield with π g = [π 1g ,...,π ng ] .Similar to Cohen-Cole, Liu, and Zenou (2018) assume that π g can be modeled as Substituting ( 5) into (4) then shows that the utility maximizing vectors of effort for the G activities are defined as the solution of a model of the form specified in (1).We note that by specifying the W s to be block diagonal, we can accommodate situations where the individuals i = 1, . . .,n belong to, say, C groups which, e.g., represent class rooms.Also, some of the x k covariates may represent group indicator variables, and others may be spatial lags of some basic covariates.

Reduced Form and Structural Model with Exclusion Restrictions
Towards computing the reduced form of the above model, let and let and that vec(A 1 A 2 ) = (A 2 ⊗ I)vec(A 1 ) for any two conformable matrices A 1 and A 2 , it is readily seen that the spatial simultaneous equation system (2) can be re-written in stacked notation as where n and I nG − R * n , the reduced form of the system is now given by As remarked, the structural parameters of the spatial simultaneous equation system (1) and ( 2) are not identified unless we impose exclusion restrictions.Let β g,n , γ g,n , λ g,n , and ρ g,n denote the G g × 1, K g × 1, p g × 1 and q g × 1 vectors of nonzero elements of the gth column of B n , C n , n , and R n , respectively, and let Y g,n , X g,n , Y g,n , and U g,n be the corresponding matrices of observations on the endogenous variables, exogenous variables, spatially lagged endogenous variables, and spatially lagged disturbances appearing in the structural equation for the gth endogenous variable.Then, system (2) can be expressed as (g = 1, . . .,G): where Z g,n = [Y g,n ,X g,n ,Y g,n ] and δ g,n = [β g,n ,γ g,n ,λ g,n ] .

Model Assumptions
We maintain the following assumptions regarding the spatial weights matrices and model parameters.
Assumption 1.For s = 1, . . .,p and r = 1, . . .,q: (a) All diagonal elements of W s,n and M r,n are zero.(b) W s,n 1 ≤ c, M r,n 1 ≤ c for some finite constant c which does not depend on n, and The spatial autoregressive parameters satisfy sup n r∈I g,ρ ρ g,r,n < 1 for g = 1, . . .,G, where I g,ρ = {r g,1 ,...,r g,q g } ⊆ {1,...,q} denotes the set of indices associated with the elements of ρ g,n .(c) The row and column sums of the matrices [I nG − B * n ] −1 are uniformly bounded in absolute value.
The above assumptions are in line with the recent spatial literature.Assumption 1(a) entails a normalization rule.Assumption 1(b) implies that the row and column sums of the matrices W s,n and M r,n are uniformly bounded in absolute value.
The assumption that W s,n ∞ = 1 and M r,n ∞ = 1 implies a normalization for the parameters.For interpretation, let W * s,n be some spatial weights matrix with W * s,n ∞ = 1 and let λ * lg,s,n be the corresponding spatial autoregressive parameter on W * s,n y l,n .Now define W s,n = W * s,n / W * s,n ∞ and λ lg,s,n = λ * lg,s,n W * s,n ∞ , then W s,n ∞ = 1 and λ lg,s,n W s,n = λ * lg,s,n W * s,n .Thus the normalizations W s,n ∞ = 1 and M r,n ∞ = 1 can always be achieved by appropriately re-scaling the elements of the spatial weights matrix, provided the corresponding spatial autoregressive parameter is correspondingly redefined; for further discussions see Kelejian and Prucha (2010). 13ssumption 2(a) ensures that the first equation of the expression for the reduced form in ( 7) is well defined.In allowing in Assumption 2(b) for the index set I g,ρ to vary with g we allow for different orders of spatial lags in the disturbance process of different equations.Next, observe that n is nonsingular.Consequently, also the second equation of the expression for the reduced form in ( 7) is well defined, and thus y n is uniquely defined by the model.Assumptions 1(b) and 2(b) imply even that sup n R * g,n ∞ < 1, which implies that the row sums of the matrices The above arguments use results in Horn and Johnson (1985, p. 301).

Assumption 3. (a)
The matrix of (nonstochastic) exogenous regressors X n in (2) has full column rank (for n sufficiently large).Furthermore, the elements of X n are uniformly bounded in absolute value by some finite constant.(b) The elements of the parameter matrices B n , C n , and n are uniformly bounded in absolute value.
An assumption such as Assumption 3(a) is common in the spatial literature.In treating X n as nonstochastic, our analysis should be viewed as conditional on X n .Note that in part (b) of Assumption 3 uniformity refers to n. Assumption 3(b) is trivially satisfied, if the parameters do not depend on the sample size, since any real number is finite.
We next state the assumptions maintained w.r.t.ε n .In the following let Assumption 4. The innovations ε n are generated as follows: where is a nonsingular G × G matrix and the random variables {v ig,n : i = 1, . . .,n,g = 1, . . .,G} are, for each n, identically and independently distributed with zero mean, unitary variance, and finite 4 + ν moments for some ν > 0, and their distribution does not depend on n.Furthermore, let = .
The above assumption on the innovation process is in line with the specification of the disturbance terms for a classical simultaneous equation system.Let ε n (i) denote the ith row of E n , then, observing that E n = V n , it is readily seen that the innovation vectors {ε n (i) : 1 ≤ i ≤ n} are i.i.d. with zero mean and VC matrix .With respect to the stacked innovation vector, the assumption implies that Eε n = 0 and Eε n ε n = ⊗ I n .
Given ( 7), we note that Assumption 4 implies, furthermore, that Eu n = 0 and x n , and that the VC matrices of u n and y n are given by, respectively, Assumptions 2 and 4 imply that the row and column sums of the VC matrix of u n (and similarly those of y n ) are uniformly bounded in absolute value, thus limiting the degree of correlation between, respectively, the elements of u n and of y n .
Remark.Under the above assumptions it is shown in Lemmata A.2 and A.3 that all random variables in [Y n ,X n ,Y n ] have uniformly bounded finite fourth moments, and that for any n × n real matrix A n whose row and column sums are bounded uniformly in absolute value.
For purposes of estimation it proves helpful to apply a spatial Cochrane-Orcutt transformation to the model.In particular, premultiplying (8) by Stacking the transformed equations yields and where ε n is as defined above.

MOMENT CONDITIONS
Recall that from the Cochrane-Orcutt transformed form of the model (10), we have The estimators for the model parameters ρ g,n and δ g,n considered in this paper will utilize a set of linear and quadratic moment conditions of the form (g = 1, . . .,G) where the n × p H instrument matrix H n in the linear form and the n × n weighting matrices A s,n in the quadratic forms are nonstochastic.In the following, we will also simply write m δ g,n and m ρ g,n for the sample moment vector at the true parameter values. 14e maintain the following assumptions regarding the instruments H n .Specific choices of instruments and a discussion of how the nonlinearity of Ey n in the parameters generates instruments from within the model are given after the assumptions.
Assumption 5.The instrument matrices H n are nonstochastic and have full column rank p H ≥ G g + K g + p g (for all n large enough).Furthermore, the elements of the matrices H n are uniformly bounded in absolute value.Additionally H n is assumed to contain, at least, the linearly independent columns of The inclusion of H n in H n ensures that the exogenous variables on the right hand side (r.h.s.) of the Cochrane-Orcutt transformed model serve as their own best instruments.For limited information estimators, it suffices to postulate that H n is assumed to contain, at least, the linearly independent columns of Assumption 6.The instruments H n satisfy furthermore: The above assumptions are in the spirit of those maintained, e.g., in Kelejian and Prucha (1998, 2004, 2010) and Lee (2003).We first discuss Assumption 5.
The best instruments for Y g,n and Y g,n are given by their conditional means.Observe that in light of (6) we have For large n, the accurate computation of the inverse of I nG − B * n , which is of dimension nG × nG and depends on unknown parameters, will be challenging if not impossible, unless the weights matrices are sparse.Furthermore, even in the single equation case existing results on the asymptotic properties of GMM estimators based on the best instruments have so far only been obtained by restricting the parameter space to a compact interval in, say, (−1,1).To avoid those difficulties and limitations, we employ an approximation of the best instruments, which is consistent in spirit with the approach adopted in the above cited literature. Given In light of the structure of B * n , it is not difficult to see that the blocks of I nG − B * n −1 can be expressed as infinite weighted sums of the matrices . .,W s g are the spatial weights matrices appearing in Y g,n , then by including in H n the linearly independent columns of H R,n ,W s 1 H R,n ,...,W s g H R,n , we may view the fitted values of Z g,n as computationally simple approximations of the best instruments EZ g,n .Suppose further that M r 1 , . . .,M r g are the spatial weights matrices in the disturbance process of the gth equation, then by including in H n also the linearly independent columns of we may view the fitted values of Z * g,n as computationally simple approximations of the best instruments EZ * g,n .The Monte Carlo results presented in the Online Supplementary Appendix suggest that in many situations, relatively low values of R are sufficient for providing a good approximation.A discussion, in a simplified context, as to how instruments are generated from within the model is given in Appendix F in the Online Supplementary Appendix.
Assumption 6(a) is standard.Assumption 6(b) is a sufficient condition that ensures the identification of δ g,n from linear moment conditions corresponding to the untransformed model ( 8).Assumption 6(c) is used to ensure identification from linear moment conditions corresponding to the transformed model (10).Those assumptions are crucial for the consistency of the first-step estimators of δ g,n from linear moment conditions only.A more detailed discussion of those assumptions, including scenarios where those conditions will not hold, are provided in Appendix F in the Online Supplementary Appendix.Of course, within the context of one-step estimation identification is still possible with the use of the quadratic moment conditions, even if identification by the linear moment conditions fails.More detailed remarks and references are provided in Appendix F in the Online Supplementary Appendix.
We will maintain the following assumptions regarding the matrices A s,n in the quadratic moment conditions (12).
Assumption 7. The row and column sums of the matrices A s,n , s = 1,...,S, are bounded uniformly in absolute value by some finite constant and, furthermore, all diagonal elements of A s,n are zero for any s = 1, . . .,S.
The assumptions that the diagonal elements of A s,n are zero ensures that the moment conditions are robust against heteroskedasticity.Exemplary specifications for A s,n include For computational purposes and for proving consistency, it is convenient to rewrite the moment conditions in (13) as where r 2,g,n = (ρ 2 g,r g,1 ,n ,...,ρ 2 g,r g,qg ,n ) , r 3,g,n = (ρ g,r g,1 ,n ρ g,r g,2 ,n , . . .,ρ g,r g,1 ,n ρ g,r g,qg ,n , . . .,ρ g,r g,qg−1 ,n ρ g,r g,qg ,n ) , recalling the definition of the index set I g,ρ = {r g,1 ,...,r g,q g } ⊆ {1,...,q} and where q * g = 2q g + q g (q g − 1)/2.15For the case of two-step estimation, let δ g,n be some estimator for δ g,n , let u g,n = y g,n − Z g,n δ g,n , and let g,n and γ g,n denote the corresponding estimators of g,n and γ g,n , respectively, which are obtained by suppressing the expectations operator and replacing u g,n by u g,n in the above expressions.Then

GENERIC ASYMPTOTIC PROPERTIES
In this section, we give a generic discussion of the asymptotic properties of GMM estimators for ρ g,n and δ g,n corresponding to the moment conditions in ( 12) and ( 13).A main focus is on two-step estimators.Two-step estimators are appealing, since they are computationally simple, and since in a single-equation context, the loss of efficiency has been found to be small under various reasonable scenarios, provided that the instruments are not weak; see, e.g., Das, Kelejian, and Prucha (2003) and the Monte Carlo results presented in the Online Supplementary Appendix.A two-step procedure may also provide some robustness for the estimation of δ g,n against misspecification of the disturbance process.As will be seen below, the limiting distribution of the GMM estimators for ρ g,n will depend on the estimator of δ g,n used in computing the estimated residual.That is, δ g,n is not a nuisance parameter for ρ g,n (although the reverse is true).As a consequence, the derivation of the limiting distribution of the two-step estimators is technically more challenging.We will discuss one-step estimators later on in the paper.In essence, the results in this section also deliver the limiting distribution of one-step estimators as a special case.
The GMM estimators for the autoregressive parameter vectors ρ n introduced below generalize the GMM estimators in Kelejian and Prucha (2004).In contrast to Kelejian and Prucha (2004), we not only accommodate higher-order spatial disturbance processes, but we also provide for a full asymptotic theory for those estimators.As a result, we are able to introduce more efficient estimators, and provide results on the joint limiting distribution of the estimators for all model parameters.

Consistency of GMM Estimator for ρ
Let ϒ gg,n be some S × S symmetric positive semidefinite (moments) weighting matrix.Then a corresponding GMM estimator for ρ g,n can be defined as with a ρ ≥ 1.We note that the objective function for ρ g,n remains well defined even for values of ρ g for which I n − R * g,n (ρ g ) is singular, which allows us to take as the optimization space a compact set containing the true parameter space.
We postulate the following additional assumption to establish consistency of ρ g,n .
Assumption 8.The smallest eigenvalue of g,n g,n is uniformly bounded away from zero.
Assumption 9. ϒ g,n − ϒ g,n = o p (1), where ϒ g,n is an S × S nonstochastic symmetric positive definite matrix.The largest eigenvalues of ϒ g,n are bounded uniformly from above, and the smallest eigenvalues of ϒ g,n are bounded uniformly away from zero (and, thus, by the equivalence of matrix norms, ϒ g,n and ϒ −1 g,n are O( 1)).
Assumption 8 requires g,n g,n to be nonsingular and in conjunction with Assumption 9 ensures that the smallest eigenvalue of g,n ϒ g,n g,n is uniformly bounded away from zero.This will be sufficient to demonstrate that ρ g,n is identifiable unique w.r.t. the nonstochastic analog of the objective function of the GMM estimator 16 Some additional remarks on this assumption and connections to the GMM literature are provided in Appendix F in the Online Supplementary Appendix.Assumption 9 ensures that ϒ g,n is positive definite with probability trending to one.Of course, Assumption 9 is satisfied for ϒ g,n = ϒ g,n = I S .In this case, the estimator defined by ( 16) reduces to a nonlinear least-squares estimator.Choices of ϒ g,n which result in efficient estimates of ρ g,n will be discussed below in conjunction with the asymptotic normality result.
Our basic consistency result for ρ g,n is given by the next theorem.

Asymptotic Distribution of GMM Estimator for ρ
The limiting distribution of the GMM estimator ρ g,n will generally depend on the estimator δ g,n used to compute estimated disturbances.To define GMM estimators for δ g,n we can employ the moment conditions (12).Leading examples for limited and full information GMM estimators for δ g,n will be the spatial two-stage least squares (2SLS) and three-stage least squares (3SLS) estimators defined in the next section.To keep the discussion general, we maintain the following assumption regarding δ g,n .
Assumption 10.The estimator δ g,n is asymptotically linear in ε n in the sense that with T gh,n = F gh,n P gh,n , where F gh,n and P gh,n are, respectively, n × p F and p F × p δ g real nonstochastic matrices whose elements are uniformly bounded in absolute value by a finite constant, and where p δ g is the dimension of the parameter vector δ g,n .(We note that under the maintained assumptions the elements of T gh,n are again uniformly bounded in absolute value by a finite constant).
In Appendix A, we show that our spatial 2SLS and 3SLS estimators satisfy this assumption.For 2SLS estimators of the parameters of the, say, first equation, G h=1 T 1h,n ε h,n reduces to T 11,n ε 1,n .Also note that under Assumptions 4 and 10, we have n 1/2 ( δ g,n − δ g,n ) = O p (1), as is assumed by Theorem 1.
In preparation of our result concerning the asymptotic distribution of the GMM estimator, we next define the matrices that will compose the limiting VC matrix.In particular, consider the S × q g matrix J and the S × S matrix ρρ gg,n = ψ ρρ rs,gg,n where As shown in the proof of the subsequent theorem, ρρ gg,n is the asymptotic VC matrix of the sample moment vector m ρ g,n (ρ g , δ g,n ).The second term in (18) stems from the fact that the sample moment vector depends on estimated residuals.If the true residuals could be observed, we could take δ g,n = δ g,n or T gh,n = 0, in which case the second term in (18) is zero.
THEOREM 2 (Asymptotic Normality).Let ρ g,n = ρ g,n ( ϒ g,n ) denote the GMM estimator for ρ g,n defined by ( 16).Then given Assumptions 1-10, and given that λ min ( ρρ gg,n ) ≥ c * > 0, we have where The above theorem implies that the difference between the cumulative distribution function of n 1/2 ( ρ g,n − ρ g,n ) and that of N 0, ρ g,n converges pointwise to zero, which justifies the use of the latter distribution as an approximation of the former.17 Remark.Clearly, ρρ gg,n (( ρρ gg,n ) is positive semi-definite for any ϒ g,n .Thus, choosing ϒ g,n as a consistent estimator for ( ρρ gg,n ) −1 leads to the efficient GMM estimator.As discussed in the proof of the above theorem, the elements of ρρ gg,n are uniformly bounded in absolute value and, hence, λ max ρρ gg,n ≤ c * * for some c * * < ∞.Since by assumption also 0 < c * ≤ λ min ( ρρ g,n ), it follows that the conditions on the eigenvalues of ϒ g,n postulated in Assumption 9 are automatically satisfied by We next define a consistent estimator for ρρ gg,n (ϒ g,n ).As a preliminary result we have the following lemma.
LEMMA 1. Suppose Assumptions 1-4 hold.For g,h = 1, . . .,G define We note that in the above lemma ρ g,n and δ g,n can be any consistent estimators.Now, let g,n be defined as in (15) and corresponding to (17) define Furthermore, corresponding to (18) consider the following estimator ρρ gg,n = ψ ρρ rs,gg,n for ρρ gg,n , where where T gh,n is some estimator for T gh,n , and u g,n = y g,n −Z g,n δ g,n .Given estimators for J g,n and ρρ gg,n we can now formulate the following estimator for ρρ gg,n : The next theorem establishes the consistency of Note that for the first part of the above theorem ρ g,n can be any consistent estimator.

Joint Asymptotic Distribution of Estimators for ρ and δ
In the following, let δ n = [δ 1,n , . . .,δ G,n ] and ρ n = [ρ 1,n , . . .,ρ G,n ] , and let δ n = [ δ 1,n , . . ., δ G,n ] and ρ n = [ ρ 1,n , . . ., ρ G,n ] denote the corresponding estimators as defined in the previous subsections.In this section, we derive the joint limiting distribution of δ n and ρ n .Knowledge of the joint asymptotic distribution of all model parameters will then enable the researcher to test for the presence of network spillovers in any of the dependent variables, explanatory variables, or disturbances in the system.
As shown in the proof of the next theorem, the joint limiting distribution of the estimators will depend on the joint limiting distribution of the following vector of linear and quadratic forms [η n ,ξ n ] with η n = [η 1,n ,...,η G,n ] and ξ n = [ξ 1,n , . . .,ξ G,n ] , where η g,n = n −1/2 G h=1 T gh,n ε h,n and the ξ g,n are defined in (20).Let where and where the (r,s)th element of ρρ gh,n is given by Analogous to (24), consider the following estimator with T gh,n being some estimator for T gh,n , and u g,n = y g,n −Z g,n δ g,n .Next, consider the estimators .., α g,S,n , then our estimator for n can be formulated as We now have the following theorem concerning the joint limiting distribution of δ n − δ n and ρ n − ρ n .
THEOREM 4 (Asymptotic Normality).Let ρ n = [ ρ 1,n ,..., ρ G,n ] where ρ g,n = ρ g,n ( ϒ g,n ) denotes the GMM estimator for ρ g,n defined by ( 16), and let δ n = [ δ 1,n , . . ., δ G,n ] be an estimator for δ, where δ g,n is asymptotically linear in ε n .Then given Assumptions 1-10, n −1 T gh,n T kl,n − n −1 T gh,n T kl,n = o p (1), and given that λ min ( n ) ≥ c for some c > 0, we have where d = G g=1 (G g + K g + p g + q g ).Furthermore, let Theorem 4 implies that the difference between the joint cumulative distribution function of the estimators of all model parameters and that of N(0, n ) converges pointwise to zero so that using the latter as an approximation of the former is justified. 18The theorem also states that n is a consistent estimator of n .Of course, since the marginal distribution of a multivariate normal distribution is normal, the above theorem also establishes the limiting distribution of any subvector of n

LIMITED AND FULL INFORMATION TWO-STEP ESTIMATORS
In the previous section, we developed generic results regarding the asymptotic properties of two-step GMM estimators for the parameters of model ( 1).The results show that the limiting distribution of GMM estimators for ρ g,n , which employ an initial estimator for δ g,n in computing estimated residuals, will generally depend on the limiting distribution of the latter.Thus, establishing the proper asymptotic theory for specific estimators is "delicate."In this section, we define specific limited and full information estimators and provide specific expressions for their asymptotic VC matrices.

Definition of Limited Information Estimators
In the following, we define, in a sequence of steps, a specific generalized spatial 2SLS (GS2SLS) estimator of δ g,n and a GMM estimator of ρ g,n based on GS2SLS residuals.WLOG we focus on the estimation of the parameters of the gth equation.
Step 1a: 2SLS estimator of δ g,n As a first step, we apply 2SLS to the gth equation of the untransformed model (8) using the instrument matrix H n in Assumptions 5 and 6 to estimate δ g,n .The 2SLS estimator, say δ g,n , is then defined as where

and
where Step 1b: Initial GMM estimator of the vectorρ g,n based on 2SLS residuals Let u g,n = u g,n ( δ g,n ) = y g,n − Z g,n δ g,n denote the 2SLS residuals of the gth equation, and let m ρ g,n (ρ g,n , δ g,n ) denote the corresponding sample moment vector as defined in (15).Our initial GMM estimator for ρ g,n is now defined as with a ρ ≥ 1.
Step 2a: GS2SLS estimator of δ g,n Analogous to Kelejian and Prucha (1998), we next compute a GS2SLS estimator of δ g,n , δ g,n ( ρ g,n ).This estimator is defined as the 2SLS estimator of the gth equation of the spatially Cochrane-Orcutt transformed model (10) with ρ g,n replaced by ρ g,n , i.e., where y * g,n ( ρ g,n , and Z * g,n ( ρ g,n ) = P H n Z * g,n ( ρ g,n ).We shall also utilize the following estimator for the (g,g)th block of δδ n (corresponding to δ g,n ): Step 2b: Efficient GMM estimator of ρ g,n based on GS2SLS residuals Let u g,n = y g,n − Z g,n δ g,n denote the GS2SLS residuals of the gth equation, and let m ρ g,n (ρ g,n , δ g,n ) denote the corresponding sample moment vector as defined in (15).Then, the corresponding efficient GMM estimator for ρ g,n based on GS2SLS residuals is given by where ρρ gg,n = ( ψ ρρ rs,gg,n ) is an estimator of the VC matrix of the limiting distribution of the normalized sample moments n 1/2 m ρ g,n (ρ g,n , δ g,n ).Specifically, we have The claim that ( ρρ gg,n ) −1 provides the efficient weighting of the sample moments will be established by Theorem 5.

Asymptotic Properties of Limited Information Estimators
In this subsection, we derive results concerning the joint limiting distribution of the GS2SLS estimators ρ g,n and δ g,n by applying the generic limit theory developed in Theorem 4. In preparation, we first specialize the expressions for estimators of the (g,g)th blocks of δρ n , ρρ n and δδ n , δρ n , ρρ n as implied by the specific structure of δ g,n and ρ g,n .More specifically, let , and J g,n = J g,n ( ρ g,n ).We now have the following result concerning the joint asymptotic distribution of δ g,n and ρ g,n .
THEOREM 5 (Joint Asymptotic Normality of ρ g,n and δ g,n ).Suppose Assumptions 1-8 hold, and that the smallest eigenvalues of n are bounded away from zero. 20Then, ρ g,n is efficient among the class of GMM estimators based on GS2SLS residuals, and In the above theorem, the estimators for the asymptotic VC matrix of ρ g,n and δ g,n employ ρ g,n as an estimator for ρ g,n .The theorem continues to hold if ρ g,n is replaced by ρ g,n (or any other consistent estimator).

Definition of Full Information Estimators
In the previous section, we discussed GS2SLS estimation, where the parameters of each equation are estimated separately from the spatially Cochrane-Orcutt transformed model (10).In the following, we consider full information estimation, where all parameters are estimated jointly from the stacked spatially Cochrane-Orcutt transformed model (11).In particular, we will consider a GS3SLS estimator.
Step 3a: GS3SLS estimator of δ n The GS3SLS estimator of δ n based on the stacked spatially Cochrane-Orcutt transformed model ( 11) is given by where Below, we shall also utilize the following estimator for δδ n (corresponding to δ n ): , and we denote the (g,h)th blocks of δδ n and δδ n with δδ gh,n and δδ gh,n , respectively.
Step 3b: GMM estimator ofρ g,n based on GS3SLS residuals In a final step, we compute a further GMM estimator of ρ g,n based on the GS3SLS residuals u g,n = y g,n − Z g,n δ g,n , where δ g,n denotes the gth component of δ n .Let m ρ g,n (ρ g,n , δ g,n ) denote the corresponding sample moment vector as defined in (15).Then, the corresponding GMM estimator for ρ g,n based on GS3SLS residuals is given by

Asymptotic Properties of Full Information Estimators
In this subsection, we derive results concerning the joint limiting distribution of the GS3SLS estimators ρ n and δ n by applying again the generic limit theory developed in Theorem 4. In preparation, we first specialize the expressions for estimators of δρ n , ρρ n and δδ n , δρ n , ρρ n as implied by the specific structure of ρ n and δ n .More specifically, let , and with J g,n = J g,n ( ρ g,n ).The next theorem establishes the joint limiting distribution of ρ n and δ n .
THEOREM 6 (Joint Asymptotic Normality of ρ n and δ n ).Suppose Assumptions 1-8 hold, and that the smallest eigenvalues of n are bounded away from zero. 21 Then, The estimators ρ g,n based on GS3SLS residuals are efficient within their class.
In the above theorem, the estimators for the asymptotic VC matrix of ρ n and δ n employ ρ g,n as an estimator for ρ g,n .The theorem continues to hold if ρ g,n is replaced by ρ g,n (or any other consistent estimator).

LIMITED AND FULL INFORMATION ONE-STEP ESTIMATORS
In the following, we discuss the one-step analogs to the two-step estimators considered in the previous section.For simplicity, we assume the availability of a consistent estimator for , say n = ( σ gh,n ).
Toward defining our one-step limited information GMM estimator, consider the stacked moment vector , ) are the vectors of linear and quadratic sample moments for the gth equation as defined in ( 12) and ( 13). 22Let Then, Em g,n (ρ g,n ,δ g,n ) = 0 and . 21 Explicit expressions for the submatrices n are given in the proof of the theorem.

Let
LL gg,n and QQ gg,n denote the corresponding estimators, where σ gg,n is replaced by some consistent estimator σ gg,n .Then the one-step limited information GMM estimator is defined as For ease of distinction from the two-step limited information estimators defined in the previous section, we refer to this estimator as the linear-quadratic GS2SLS (LQ-GS2SLS) estimator.A simple adaptation of the methodology used to derive the limiting distribution of two-step estimators yields , and where o n .The one-step estimator defined in ( 36) is efficient among GMM estimators based on the moment conditions Em g,n (ρ g,n ,δ g,n ) = 0.However, a comparison of the above expressions for the asymptotic VC matrix of the one-step estimator with those for the two-step estimator given by Theorem 5 reveals that in the case where Z g,n only contains exogenous variables, and thus α g,n = 0, both estimators have the same limiting distribution.
Our setup contains as an important special case models without spatial lags in the disturbances (i.e., where ρ g,n = 0 is known).Obviously the LQ-GS2SLS estimator, which combines both linear and quadratic moment conditions, remains well defined in this case, and we obtain ( δ g,n ) −1 as a special case.The use of quadratic moment conditions may be especially beneficial in cases where identification by the linear moments is weak; see, e.g., Kuersteiner and Prucha (2020) for a recent discussion.
Toward defining our one-step full information GMM estimator consider the stacked sample moment vector QQ n denote the corresponding estimators where σ gh,n is replaced by some consistent estimator σ gh,n .Then the one-step full information GMM estimator is defined as For ease of distinction from the two-step full information estimators defined in the previous section, we refer to this estimator as the LQ-GS3SLS estimator.Again, a simple adaptation of the methodology used to derive the limiting distribution of two-step estimators yields where A comparison of the above expression for the asymptotic VC matrix of the one-step full information estimator with that for the two-step estimator given by Theorem 6 reveals that in the case where the Z g,n only contain exogenous variables, and thus α g,n = 0, both estimators have the same limiting distribution.

CONCLUDING REMARKS
This paper develops estimation methodologies for a cross-sectional simultaneous equation model in G variables, where simultaneity stems from interdependencies in the G variables as well as from network interdependencies.Taking guidance from the spatial literature network interdependencies are modeled in the form of weighted averages.For simplicity, and consistent with the spatial literature, we refer to those weighted averages as spatial lags.We allow for higher-order spatial lags in the endogenous variables, exogenous variables, and disturbances.As a consequence, the model provides for significant flexibility in modeling network effects.
The paper develops an estimation theory for both limited and full information generalized method of moments estimators, which utilize both linear and quadratic moment conditions.We consider both two-and one-step estimators.An important aim in specifying our estimators was that the estimators remain feasible even for very large data sets and general weights matrices.
To explore the small sample properties of our estimators we conducted a Monte Carlo study.The details of the design and the results of that study are given in Appendix G in the Online Supplementary Appendix.In general, the results are encouraging.We consider scenarios where the parameters are well identified by the linear moment conditions as well as scenarios where they are not.The study includes the maximum likelihood (ML) estimator for comparison.In the well identified case, the biases of all considered estimators are fairly small.As expected, the ML estimator has the smallest root mean squared error (RMSE).However, in general, the ML estimator only dominates the GS3SLS estimator slightly, and the GS3SLS estimator dominates the GS2SLS estimator.The differences in RMSE are the most pronounced for the estimates of the autoregressive parameters in the disturbance process.For the scenarios where identification by linear moment conditions alone is weak, the GS2SLS and GS3SLS estimators can, as expected, be substantially biased.In these scenarios, the LQ-GS2SLS and LQ-GS3SLS estimators can greatly outperform the GS2SLS and GS3SLS estimators.However, for the well identified scenarios the benefit of combining linear and quadratic moment conditions seems limited.
We expect the model will be helpful for empirical research in both macro and micro economic settings, as well as areas outside of economics.As illustrated in the paper, one potential application is for modeling social interactions in different activities; e.g., the level of different physical activities among groups of friends connected via an activity tracker such as Fitbit.Suggestions for future research include an extension of the methodology to panel data, as well as an extension that allows for measurement errors in the data.

A. APPENDIX: PRELIMINARY RESULTS
In this appendix, we collect some preliminary results.All proofs are relegated to the Online Supplementary Appendix.

A.1. Asymptotic Linearity of S2SLS, GS2SLS, and GS3SLS
Assumption 10 postulates that the estimators of the regression parameters are asymptotically linear.In the following, we show that the S2SLS, GS2SLS, and GS3SLS estimators are indeed asymptotically linear.LEMMA A.1.Let A n = (a ij,n ) be some m n × m n real matrix where the row sums of the absolute elements are bounded uniformly in n by some finite constant.Let μ n = (μ 1,n ,...,μ m n ,n ) and η n = (η 1,n ,...,η m n ,n ) be some m n × 1 random vectors with sup n max m n i=1 E μ i,n p < ∞ and sup n max m n i=1 E η i,n p < ∞ for some p > 1, and where C does not depend on i,j, and n.
) be some n × n matrix, where the row and column sums of the absolute elements are bounded uniformly in n by some finite constant.Then n LEMMA A.4. Suppose Assumptions 1-4, 5 and 6 hold.Consider the S2SLS estimator where Z g,n = P H n Z g,n and with T gg,n = F gg,n P gg,n and where (c) P gg is a finite matrix and P gg,n − P gg = o p (1) for for some c > 0 for all large n.
LEMMA A.5. Suppose Assumptions 1-4, 5 and 6 hold.Consider the GS2SLS estimator where , where ρ g,n is any consistent estimator for ρ g,n .
Then, (a with T * gg,n = F * gg,n P * gg,n and where LEMMA A.6.Suppose Assumptions 1-4, 5 and 6 hold.Consider the GS3SLS estimator where , and ρ n = ρ 1,n ,..., ρ G,n is any consistent estimator for ρ n and n is any consistent estimator for .Then n , and where and

A.2. Auxiliary Results for Linear Quadratic Forms
In the following, we establish some auxiliary results on the relationship between linear and quadratic forms based on some n × 1 disturbance vector u n and corresponding forms based on a predictor u n .
Assumption A.1.For n ≥ 1 the n × 1 disturbance vector u n is generated by where R n is a nonstochastic nonsigular n × n matrix, and the row and column sums of the absolute elements of R n and R −1 n are bounded uniformly by some finite constant, and the innovations e n = (e 1,n ,...,e n,n ) have the following properties: For each n ≥ 1 the random variables e 1,n ,...,e n,n are totally independent with Ee i,n = 0, E(e 2 i,n ) = σ 2 > 0, and sup 1≤i≤n,n≥1 E e i,n 4+υ < ∞ for some υ > 0.
Assumption A.2.The predictor u n for u n satisfies that where ) is an n × p random matrix and n is a p × 1 random vector.
Assumption A.3.For any n×n real matrix A * n , whose row and column sums are bounded uniformly in absolute value, Remark.The above assumptions are formulated in a general fashion, so that the results on the properties of linear quadratic forms established below can also be utilized in a variety of contexts.For an interpretation of the results specific to this paper, suppose that u n corresponds to the disturbance term of the g-th equation of the model defined by (1)-( 8).Then the quantities considered in Assumptions A.1-A.3 should be interpreted as u n = u g,n , D n = −Z g,n , R n = I − r∈I g,ρ ρ g,r,n M r,n , e n = ε g,n , and n = δ g,n − δ g,n , where δ g,n is some estimator for the parameter vector δ g,n .Observe that under Assumptions 1-4, and given δ g,n a n 1/2 -consistent estimator, these quantities clearly satisfy Assumptions A.1-A.3 in light of Lemmata A.1 and A.2. LEMMA A.7.Let A * n be an n × n real matrix whose row and column sums are bounded uniformly in absolute value.Suppose Assumptions A.1 and A.2 hold, then: 1), and We next state an additional assumption regarding n , which is satisfied by the various IV estimators considered in the paper; compare Lemmata A.4-A.6.This assumption then permits a specialization of the r.h.s.expression for n −1/2 u n A * n u n given in Lemma A.7(c) for the case where A * n = R n A n R n , which will be helpful for deriving its limiting distribution.Note that given Assumption A. 1 and A with T h,n = F h,n P h,n where the F h,n = (f his,n ) are n × p F dimensional real nonstochastic matrices with sup n n −1 n i=1 f his,n η < ∞ with η > 2 for h = 1,...,G,s = 1,...,p F , and P h,n = (p hkl,n ) are p F × p dimensional real nonstochastic matrices whose elements are uniformly bounded in absolute value.
We now have the following specialization of Lemma A.7(c).
LEMMA A.8. Suppose Assumptions A.1-A.3 hold where n is asymptotically linear satisfying Assumption A.4, and suppose e n = ε g,n .Furthermore, let A n be an n × n real matrix whose row and column sums are bounded uniformly in absolute value.
(a) Then (b) If furthermore the diagonal elements of A n are zero, then σ hl a h,n a l,n .

B. Appendix: Proofs for Section 4
Proof of Theorem 1.The existence and measurability of ρ g,n is assured by, e.g., Lemma 3.4 in Pötscher and Prucha (1997).The objective function of the weighted nonlinear least squares estimator and its corresponding nonstochastic counterpart are given by, respectively, where the quantities appearing in the above equation are defined before the theorem in the text.To prove the consistency of ρ g,n we show that the conditions of, e.g., Lemma 3.1 in Pötscher and Prucha (1997) are satisfied for the problem at hand.We first show that ρ g,n is an identifiably unique sequence of minimizers of R n .Observe that R n (ρ g ) ≥ 0 and that R n (ρ g,n ) = 0, since γ n = g,n r g,n (ρ g,n ) by ( 14).Utilizing Assumptions 8 and 9, we get for some λ * > 0. Hence, for every ε> 0 and n, we have inf Next, observe that the elements of n and n are all of the form n −1 Eu n A n u n and n −1 u n A n u n , where the row and column sums of A n are bounded uniformly in absolute value.Recall that u g,n = r∈I g,ρ ρ g,r,n M r,n u g,n + ε g,n and observe that u g,n − u g,n = −Z g,n ( δ g,n − δ g,n ).By Assumption 1 and 2, the row and column sums of the absolute elements of r∈I g,ρ ρ g,r,n M r,n and r∈I g,ρ ρ g,r,n M r,n −1 are bounded in absolute value by some finite constant.By Assumptions 4, the elements of ε g,n are totally independent with zero mean, positive variance and finite 4 + υ absolute moments for some υ > 0. Furthermore, observe that by Lemma A.2 the fourth absolute moments of the elements of Z g,n are uniformly bounded by a finite constant and n 1/2 ( δ g,n − δ g,n ) = O p (1).It now follows immediately from Lemma A.7(a) that n − n p → 0, that the elements of n are O(1) and, consequently, that the elements of n are O p (1).The elements of ϒ n and ϒ n have the analogous properties in light of condition (b) in the theorem.Given this, it follows from the above inequality that R n (ω,ρ g ) − R n (ρ g ) converges to zero uniformly over the optimization space, i.e., sup as n → ∞.The consistency of ρ g,n now follows directly from Lemma 3.1 in Pötscher and Prucha (1997).
Proof of Theorem 2. We have shown in Theorem 1 that the GMM estimator ρ g,n defined in ( 16) is consistent.Apart on a set, whose probability tends to zero, the estimator satisfies the following first-order condition Substituting the mean value theorem expression where ρg,n is some between value, into the first-order condition yields Observe that as n → ∞, and furthermore g,n = O p (1) and g,n = O(1).In particular, λ max ( g,n ) ≤ λ * * where λ * * is some finite constant.Observe that in light of Assumptions 8 and 9, we have for some λ * > 0, observing that ∂ρ g = λ min I q g + semi-positive matrix ≥ 1.
Hence 0 < λ max { −1 g,n } = 1/λ min ( g,n ) ≤ 1/λ * < ∞, and thus we also have −1 g,n = O(1).Let + g,n denote the generalized inverse of g,n .It then follows as a special case of Lemma F1 in Pötscher and Prucha (1997) that g,n is nonsingular eventually with probability tending to one, that In light of the above discussion, the first term on the r.h.s. is zero on ω-sets of probability tending to one.This yields Observe that In light of ( 15), the elements of n 1/2 m ρ g,n (ρ g,n , δ g,n ) are of the form (s = 1,...,S) with T gh,n ε h,n + o p (1).
Recall that if we define n these quantities satisfy Assumptions A.1-A.3 in light of Lemmata A.1-A.3.Hence, it follows from Lemma A.8 that Furthermore, the lemma implies that for some η > 2 the sample moments of the absolute elements T h,n α g,s,n are uniformly bounded.We now have Let ρρ gg,n = (ψ ρρ rs,gg,n ) denote the VC matrix of the vector of linear quadratic forms on the r.h.s. of (B.8), then observing that the diagonal elements of A s,n are zero and that the VC matrix of ε n is ⊗ I n , it follows from Lemma A.1 in Kelejian and Prucha (2010) that We note that λ min ( ρρ gg,n ) ≥ const > 0 by assumption.Since the matrices A s,n , the vectors of the linear forms, and-in light of Assumption 4-the innovations ε n satisfy all of the remaining assumptions of the central limit theorem for vectors of linear quadratic forms given in Kelejian and Prucha (2010) it now follows that observing that g,n = J g,n ϒ g,n J g,n .This establishes ( 19).Since all of the nonstochastic terms on the r.h.s. of (B.11 This establishes the last claim of the theorem.
Proof of Lemma 1. Observe that u g,n = u g,n − Z g,n g,n with g,n = δ g,n − δ g,n , and thus Consequently By Assumption 4, we have σ gh = G l=1 σ * lg σ * lh and ε g = G l=1 σ * lg v l .Since the elements of v are i.i.d.(0,1), it follows that n   1) and thus ϒ g,n = O p (1).In proving Theorem 2, we have verified that J g,n − J g,n p → 0, J g,n = O(1) and J g,n = O p (1), and furthermore that Proof.The result follows immediately from Lemma A1 in Kelejian and Prucha (2010), observing that the diagonal elements of A * = ( ⊗I)A( ⊗I) and B * = ( ⊗I)B( ⊗ I) are zero.

Proof of Theorem 4. Observe that η
which verifies the expressions for the elements of n = cov(ξ n ,ξ n ).
In light of Assumption 10 and Theorem 2, we have The vector of linear-quadratic forms [η n ,ξ n ] is readily seen to satisfy the assumptions of Theorem A.1 in Kelejian and Prucha (2010).Hence, it follows from that central limit theorem that Next recall that in proving Theorem 3, we demonstrated that α g,r,n − α g,r,n = o p (1), α g,r,n = O(1) and α g,r,n = O p (1), and furthermore that σ gh,n − σ gh = o p (1).Since  1) and thus ϒ g,n = O p (1).Also recall from the proof of Theorem 3 that J g,n − J g,n = o p (1), J g,n = O(1) and J g,n = O p (1), and furthermore that ( J g,n ϒ g,n J g,n Proof.To prove the claim we verify the assumptions of Theorem 1. Assumptions 1-8 are maintained.Assumption 9 holds trivially with ϒ g,n = ϒ g,n = I.Observing furthermore that by Lemma A.4 we have n 1/2 ( δ g,n − δ g,n ) = O p (1) completes the proof.
Proof of Theorem 5.The proof of Theorem 5 is based on the generic limit theory developed in Theorem 4. In light of that theorem it proves convenient to first derive the limiting distribution of δ n = ( δ 1,n ,..., δ G,n ) and ρ n = ( ρ 1,n ,..., ρ G,n ) .The limiting distribution of δ g,n and ρ g,n is then obtained as a trivial specialization.For clarity we divide the remainder of the proof into several parts.
The remaining conditions of Assumption 10 are also seen to hold in light of Lemma A.5.Part 2: (Specialized Expressions for n and the Corresponding Estimator) Next observe that the components of n defined in (26) simplified in obvious ways in that T n = diag g (T gg,n ), and thus all terms involving a T gh,n with g = h are zero.In particular, the  Observe from part 3 of the proof that n −1 T gg,n T gg,n = n −1 Z * g,n ( ρ g,n ) Z * g,n ( ρ g,n ) −1 and that n −1 T gg,n T gg,n − n −1 T gg,n T gg,n = o p (1).The asymptotic normality result of the theorem now follows immediately from (C.1), observing that Theorem 4 also established the consistency of the VC estimators.
Proof of Theorem 6.The proof is again based on the generic limit theory developed in Theorem 4. For clarity, we divide the proof, analogous to the proof of Theorem 5, into several parts.
Part 1: (Verification of Assumption 10 for δ n ) In light of Lemma A.6, we have n 1/2 [ δ n − δ n ] = n −1/2 T n ε n + o p (1) with Next let Let T gh,n and P gh,n denote the (g,h)th block of T n and P n , respectively.Then, clearly, T gh,n = H n P gh,n .Next observe that and thus the (g,h)th block of

SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit: 10.1017/S026646662200007X.
g * (ρ g,n ) Now let T gh,n and P gh,n denote the (g,h)th block of T n and P n , then T gh,n = F gh,n P gh,n with F gh,n = H n .The remaining conditions of Assumption 10 are then seen to hold in light of Lemma A.6.Part 2: (Specialized Expressions for n and the Corresponding Estimator) Specialized expressions for each of the submatrices δδ n , δρ n , ρρ n of n defined in (26) are readily found by substituting into those expressions the formulas for T n given in (C.2), and by observing that G u=1 G v=1 σ uv,n T gu,n T hv,n represents the (g,h)th block of δδ n = T n ( ⊗ I)T n .