Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-28T14:05:06.313Z Has data issue: false hasContentIssue false

Regression of binary network data with exchangeable latent errors

Published online by Cambridge University Press:  03 July 2023

Frank W. Marrs*
Affiliation:
Los Alamos National Laboratory, Los Alamos, NM, USA
Bailey K. Fosdick
Affiliation:
Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
*
Corresponding author: Frank W. Marrs; Email: fmarrs3@lanl.gov
Rights & Permissions [Opens in a new window]

Abstract

Undirected, binary network data consist of indicators of symmetric relations between pairs of actors. Regression models of such data allow for the estimation of effects of exogenous covariates on the network and for prediction of unobserved data. Ideally, estimators of the regression parameters should account for the inherent dependencies among relations in the network that involve the same actor. To account for such dependencies, researchers have developed a host of latent variable network models; however, estimation of many latent variable network models is computationally onerous and which model is best to base inference upon may not be clear. We propose the probit exchangeable (PX) model for undirected binary network data that is based on an assumption of exchangeability, which is common to many of the latent variable network models in the literature. The PX model can represent the first two moments of any exchangeable network model. We leverage the EM algorithm to obtain an approximate maximum likelihood estimator of the PX model that is extremely computationally efficient. Using simulation studies, we demonstrate the improvement in estimation of regression coefficients of the proposed model over existing latent variable network models. In an analysis of purchases of politically aligned books, we demonstrate political polarization in purchase behavior and show that the proposed estimator significantly reduces runtime relative to estimators of latent variable network models, while maintaining predictive performance.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the reused or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Algorithm 1. EMM estimation of the PX model

Figure 1

Figure 1. The left panel depicts performance in estimating $\beta$: RMSE between the EMM estimator and the MLE (RMSE$(\widehat{\beta }_{\textrm{MLE}} - \widehat{\beta }_{\textrm{EMM}})$), between the MLE and the truth (RMSE$(\widehat{\beta }_{\textrm{MLE}} -{\beta })$), and between the MLE and the standard probit estimator (RMSE$(\widehat{\beta }_{\textrm{MLE}} - \widehat{\beta }_{\textrm{Std. probit}})$). The right panel depicts performance in estimating $\rho$: RMSE between the MLE and the EMM estimator (RMSE$({\widehat{\rho }}_{\textrm{MLE}} -{\widehat{\rho }}_{\textrm{EMM}})$) and between the MLE and the truth (RMSE$({\widehat{\rho }}_{\textrm{MLE}} - \rho )$). The RMSEs are plotted as a function of the true values of $\rho$, and solid vertical lines denote Monte Carlo error bars. Some points obscure their Monte Carlo error bars.

Figure 2

Figure 2. The left panel depicts the RMSE in estimating $\beta$ using the EMM algorithm, MLE, and standard probit regression. The right panel depicts the same for $\rho$. The MSEs are plotted as a function of the true values of $\rho$, and solid vertical lines denote Monte Carlo error bars.

Figure 3

Figure 3. Performance ($n^{1/2}$ RMSE) of estimators of $\beta$, for a given $\textbf{X}$, when generating from the PX model (top row) and the latent eigenmodel (LE; bottom row). Variability captured by the boxplots reflects variation in RMSE with $\textbf{X}$.

Figure 4

Figure 4. Average runtimes of various algorithms used on simulated data.

Figure 5

Figure 5. The left panel depicts $n^{1/2}$RMSE in estimating $\beta _1$, using the EMM algorithm and standard probit regression, under $t$ distribution of the errors. The right panel depicts $n^{1/4}$RMSE in estimating $\rho$ using the EMM algorithm in the same simulation. Variability captured by the boxplots reflects variation in RMSE with $\textbf{X}$.

Figure 6

Figure 6. Krebs’ political books network (left) and out-of-sample performance in 10-fold cross-validation, as measured by area under the precision-recall curve (PRAUC, right), plotted against mean runtime in the cross-validation. The estimators are standard probit assuming independent observations (Std. probit), the PX model as estimated by the EMM algorithm (PX), the social relations model estimator (SRM), and the latent eigenmodel estimator (LE).

Figure 7

Table 1. Results of fitting the Krebs political books data using the EMM estimator for the PX model and the amen estimator for the social relations and latent eigenmodels (SRM and LE, respectively). Point estimates for the coefficients are given to the left of the vertical bar, and runtimes (in seconds) and minimum effective sample sizes across the coefficient estimates are given to the right

Figure 8

Figure D1. PX model: Scaled bias and variance of estimators of ${\boldsymbol{\beta }}$ for a given $\textbf{X}$ when generating from the PX model. Variability captured by the boxplots reflects variation with $\textbf{X}$.

Figure 9

Figure D2. LE model: Scaled bias and variance of estimators of ${\boldsymbol{\beta }}$ for a given $\textbf{X}$ when generating from the latent eigenmodel. Variability captured by the boxplots reflects variation with $\textbf{X}$.

Figure 10

Figure D3. RMSE, scaled by $n^{1/2}$, of the EMM estimator and amen estimator of the social relations model of $\rho$ when generating from the PX model. Variability captured by the boxplots reflects variation in $n^{1/2}$RMSE with $\textbf{X}$.

Figure 11

Figure D4. $t$model: Scaled RMSE, for PX: EMM and standard probit regression, when generating from the PX model modified to have latent errors with heavier-tailed distribution.

Figure 12

Figure E1. Out-of-sample performance in 10-fold cross-validation, as measured by area under the precision-recall curve (ROC AUC), plotted against mean runtime in the cross-validation for Krebs’ political books network. The estimators are standard probit assuming independent observations (Std. probit), the proposed PX estimator as estimated by EMM (PX: EMM), the social relations model as estimated by amen (SRM: amen), and the latent eigenmodel as estimated by amen (LE: amen).

Figure 13

Figure E2. The average of all pairwise expectations $\frac{1}{|\Theta _2|} \sum _{jk, lm \in \Theta _2} E[\epsilon _{jk} \epsilon _{lm} \, | \, y_{jk}, y_{lm} ]$ is shown in orange, and the linear approximation to this average, described in Section 5, is shown in dashed blue. In addition, pairwise conditional expectations $E[\epsilon _{jk} \epsilon _{lm} \, | \, y_{jk}, y_{lm} ]$ are shown in light gray, for a random subset of 500 relation pairs $(jk, lm) \in \Theta _2$.

Supplementary material: PDF

Marrs and Fosdick supplementary material

Marrs and Fosdick supplementary material

Download Marrs and Fosdick supplementary material(PDF)
PDF 795.3 KB