Hostname: page-component-77c78cf97d-54lbx Total loading time: 0 Render date: 2026-04-24T09:01:36.527Z Has data issue: false hasContentIssue false

A generalized hypothesis test for community structure in networks

Published online by Cambridge University Press:  11 March 2024

Eric Yanchenko*
Affiliation:
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Srijan Sengupta
Affiliation:
Department of Statistics, North Carolina State University, Raleigh, NC, USA
*
Corresponding author: Eric Yanchenko; Email: ekyanche@ncsu.edu
Rights & Permissions [Opens in a new window]

Abstract

Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world datasets.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Algorithm 1. Greedy

Figure 1

Algorithm 2. Bootstrap hypothesis test

Figure 2

Figure 1. Rejection rates from simulation study. See Section 4 for complete details. (a) Baseline value null with fixed $\tilde \gamma (P)$; (b) baseline value null with fixed $n$; (c) Erdős–Rényi null with fixed $\tilde \gamma (P)$; (d) Erdős–Rényi null with fixed $n$; (e) Chung–Lu null with fixed $\tilde \gamma (P)$; (f) Chung–Lu null with fixed $n$.

Figure 3

Table 1. The number of nodes $n$ and edges $m$ for real-world networks. $\tilde T(A)$ is the observed value of the $\mathsf{E2D2}$ parameter and $\gamma _0$ is the largest null value such that the baseline value test would be rejected. Additionally, we report the $p$-values for the adjusted Spectral method and Bootstrap method against different null hypotheses (ER = Erdős–Rényi, CL = Chung–Lu)

Figure 4

Figure 2. Histograms of bootstrap samples from the proposed method for the two real datasets. The orange histogram is with the Erdős–Rényi null, and the blue histogram is with the Chung–Lu null. The vertical line (black) indicates the value of the test statistic.

Supplementary material: File

Yanchenko and Sengupta supplementary material

Yanchenko and Sengupta supplementary material
Download Yanchenko and Sengupta supplementary material(File)
File 204.9 KB