Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-03-29T01:50:45.448Z Has data issue: false hasContentIssue false

Automated detection of edge clusters via an overfitted mixture prior

Published online by Cambridge University Press:  19 January 2024

Hanh T. D. Pham*
Affiliation:
University of Iowa, Iowa City, IA, USA
Daniel K. Sewell
Affiliation:
University of Iowa, Iowa City, IA, USA
*
Corresponding author: Hanh T. D. Pham; Email: hanh-pham@uiowa.edu
Rights & Permissions [Opens in a new window]

Abstract

Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Relationship between parameters in the LSEC model (circles) and parameters in the proposed extension (rectangles).

Figure 1

Figure 2. Example of simulated network where $n = 400, K = 6, p = 2$. Edges are colored according to their cluster assignment. The network layout is set to show the six different edge clusters clearly.

Figure 2

Table 1. Clustering results of comparing aLSEC (assuming $p$ = 3) with the existing methods. For both NMI and ARI, the higher the values, the better

Figure 3

Figure 3. The ratios of the run time of the LSEC method over our aLSEC method.

Figure 4

Table 2. Sensitivity analysis assuming different values of $p$ for aLSEC (represented by the different columns). The NMI values are averaged over 200 simulated networks. The higher the NMI values, the better

Figure 5

Figure 4. Results of applying aLSEC to the patient transfer network of California. Each point represents the MAP estimate latent position $\boldsymbol{{U}}$ of each actor. Edges are colored according to their cluster assignment.

Figure 6

Figure 5. Results of running epidemic simulation on the network with edges labeled according to aLSEC results (a) and randomly (b). Each curve showed the number of edges carrying the transmission of a cluster over time.

Figure 7

Figure 6. Posterior distribution of the number of clusters $K$ after applying HMC-within-Gibbs algorithm to UK Faculty network.

Figure 8

Figure 7. The MAP estimate of the latent positions $\boldsymbol{{U}}$. Edge colors correspond to the MAP edge partition. The three hollow shapes represent the three schools. The two solid circles represent the two individuals who did not mention their schools.

Figure 9

Figure 8. Heat map of the $P_{M \times M}$ matrix showing the posterior probabilities of any two edges sharing the same cluster. Darker colors imply higher probabilities. From left to right, edges are ordered into blocks, such as the first three blocks (size $317, 250, 96$) are of edges representing within-school connections and the last block (size $152$) is of edges representing inter-school connections.

Supplementary material: PDF

Pham and Sewell supplementary material

Appendix

Download Pham and Sewell supplementary material(PDF)
PDF 22.3 MB