Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-04-17T15:34:57.488Z Has data issue: false hasContentIssue false

Nonparametric two-sample test for networks using joint graphon estimation

Published online by Cambridge University Press:  15 May 2025

Benjamin Sischka
Affiliation:
Allianz Private Krankenversicherungs-AG, Risikomanagementfunktion, Munich, Germany
Göran Kauermann*
Affiliation:
Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
*
Corresponding author: Göran Kauermann; Email: goeran.kauermann@lmu.de
Rights & Permissions [Opens in a new window]

Abstract

This paper focuses on the comparison of networks on the basis of statistical inference. For that purpose, we rely on smooth graphon models as a nonparametric modeling strategy that is able to capture complex structural patterns. The graphon itself can be viewed more broadly as local density or intensity function on networks, making the model a natural choice for comparison purposes. More precisely, to gain information about the (dis-)similarity between networks, we extend graphon estimation towards modeling multiple networks simultaneously. In particular, fitting a single model implies aligning different networks with respect to the same graphon estimate. To do so, we employ an EM-type algorithm. Drawing on this network alignment consequently allows a comparison of the edge density at local level. Based on that, we construct a chi-squared-type test on equivalence of network structures. Simulation studies and real-world examples support the applicability of our network comparison strategy.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Functional coactivation networks of the human brain. The illustrated connectivity patterns result from averaging over multiple measurements for subjects with autism spectrum disorder (left) and typical development (right). Do these networks reveal a significant structural difference?.

Figure 1

Figure 2. Dividing the unit square as domain of the graphon model into small segments for comparing network structure on a microscopic level. Left: division of $w^{\text{joint}}(\cdot ,\cdot )$ into approximately piecewise-constant rectangles. Middle and right: edge positions $(u_{i}^{(g)}, u_{j}^{(g)})^\top$ of two simulated networks with respect to $w^{\text{joint}}(\cdot ,\cdot )$; weakly colored crosses and intensively colored circles represent absent and present edges, respectively. The two networks can be compared by pairwise contrasting the edge proportions within the labeled rectangles.

Figure 2

Figure 3. Joint graphon estimation for simulated networks with subsequent testing on equivalence of the underlying distribution models. The top row shows the true and the jointly estimated graphon on the left and right, respectively. The realizations of the terms of test statistic (10), representing the dissimilarities of the two networks per rectangle, are visualized at the bottom left, where $m_{kl}^{(g)} \geq 100$ for $k \neq l$ and $\geq 45$ otherwise (cf. contingency table on page 16). The final result of the test statistic (black solid vertical line) as well as its distribution under $H_0$ are illustrated at the bottom right, where the black solid step function and the blue dashed curve depict the simulated and the asymptotic chi-squared distribution, respectively. The red dashed vertical lines (separated by red dot) visualize the critical values at a significance level of $5\%$, derived from the simulated (upper line) and the asymptotic distribution (lower line).

Figure 3

Figure 4. Performance of the testing procedure with regard to the resulting $p$-value; results are simulation-based. Top: empirical distribution of the $p$-value under $H_0$, illustrated as density (left, including a depiction of rejection rates at significance level of 5%) and cumulative distribution function (right). The black dashed lines illustrate the desired distributional behavior of an optimal test. Number of repetitions for estimated / oracle node positions: $400$ / $10,000$. Bottom: distribution of the $p$-value under $H_1$ and the usage of oracle node positions (in box plot format); based on $1,000$ repetitions each. The x-axis illustrates different settings according to formulation (12) (higher value of $\gamma$ implies stronger deviation from $H_0$). The black dashed horizontal line represents the $5\%$ significance level, and the orange curve illustrates the corresponding power.

Figure 4

Figure 5. Comparison of two facebook ego networks. Top: illustration of networks with coloring referring to estimated node positions. Middle: ordered adjacency matrices divided into blockwise segments. Bottom left: segment-wise differences between the two networks with $m_{kl}^{(g)} \geq 100$ for $k \neq l$ and $\geq 45$ otherwise (cf. contingency table on page 16); gray rectangles do not contain any observed edges ($d_{kl}=0$) and thus provide no information. Bottom right: realization of test statistic (black solid vertical line) plus corresponding distribution under $H_0$ (black solid step function and blue dashed curve represent simulated and asymptotic chi-squared distribution, respectively); critical values derived from the two types of distributions are represented by the upper and the lower red dashed vertical line (separated by red dot).

Figure 5

Figure 6. Comparison of functional coactivation in the human brain between groups of subjects with autism spectrum disorder and with typical development. The top row shows the networks of the ASD and the TD group on the left- and right-hand side, respectively. All illustration aspects are equivalent to the representation in Figure 5. The number of nodes per rectangle is again given by $m_{kl}^{(g)} \geq 100$ for $k \neq l$ and $\geq 45$ otherwise, where N/A’s in the blockwise differences result from $d_{kl}$ or $m_{kl} - d_{kl}$ being zero.

Figure 6

Figure 7. Left: proposal density for different current states $u^{(g), \, \lt t\gt }_{i}$ with $\sigma _v = 1$. Right: standard deviation of the proposal density against current state $u^{(g), \, \lt t\gt }_{i}$; different profiles represent the behavior for different settings of $\sigma _v$.

Supplementary material: File

Sischka and Kauermann supplementary material

Sischka and Kauermann supplementary material
Download Sischka and Kauermann supplementary material(File)
File 2.1 MB