Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-04-18T03:05:19.587Z Has data issue: false hasContentIssue false

Model-based clustering for network data via a latent shrinkage position cluster model

Published online by Cambridge University Press:  17 November 2025

Xian Yao Gwee
Affiliation:
School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
Isobel Claire Gormley
Affiliation:
School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
Michael Fop*
Affiliation:
School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
*
Corresponding author: Michael Fop; Email: michael.fop@ucd.ie
Rights & Permissions [Opens in a new window]

Abstract

Low-dimensional representation and clustering of network data are tasks of great interest across various fields. Latent position models are routinely used for this purpose by assuming that each node has a location in a low-dimensional latent space and by enabling node clustering. However, these models fall short through their inability to simultaneously determine the latent space dimension and number of clusters. Here we introduce the latent shrinkage position cluster model (LSPCM), which addresses this limitation. The LSPCM posits an infinite-dimensional latent space and assumes a Bayesian nonparametric shrinkage prior on the latent positions’ variance parameters resulting in higher dimensions having increasingly smaller variances, aiding the identification of dimensions with non-negligible variance. Further, the LSPCM assumes the latent positions follow a sparse finite Gaussian mixture model, allowing for automatic inference on the number of clusters related to non-empty mixture components. As a result, the LSPCM simultaneously infers the effective dimension of the latent space and the number of clusters, eliminating the need to fit and compare multiple models. The performance of the LSPCM is assessed via simulation studies and demonstrated through application to two real Twitter network datasets from sporting and political contexts. Open-source software is available to facilitate widespread use of the LSPCM.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. The graphical model representation of the LSPCM.

Figure 1

Figure 2. (a) Posterior distribution of the number of active clusters $G_+$ and (b) the distribution of the effective latent space dimension $\hat {p}$ across 30 small simulated networks. The true number of clusters and latent dimensions are highlighted in red.

Figure 2

Figure 3. (a) Posterior distribution of the number of active clusters $G_+$ and (b) the distribution of the effective latent space dimension $\hat {p}$ across 30 moderately sized simulated networks. The true number of clusters and latent dimensions are highlighted in red.

Figure 3

Figure 4. (a) Posterior distribution of the number of active clusters $G_+$ and (b) the distribution of the effective latent space dimension $\hat {p}$ across 30 small simulated networks with clusters of slightly different volumes. The true number of clusters and dimensions are highlighted in red.

Figure 4

Figure 5. (a) Posterior distribution of the number of active clusters $G_+$ and (b) the distribution of the effective latent space dimension $\hat {p}$ across 30 small simulated networks with clusters of highly different volumes. The true number of clusters and dimensions are highlighted in red.

Figure 5

Figure 6. One of the 30 simulated networks when clusters have highly different volumes with (a) the true positions and cluster labels and (b) the LSPCM posterior mean positions conditioned on the modal effective number of dimensions ($\hat {p}_{m}=2$) and the estimated cluster labels.

Figure 6

Figure 7. For the football players network, (a) the posterior distribution of the number of non-empty components $G_+$, and (b) the distribution of the number of effective latent space dimensions $\hat {p}$.

Figure 7

Figure 8. Football players network, (a) the Fruchterman-Reingold layout with players colored by club, (b) the Fruchterman-Reingold layout with players colored by inferred cluster label and (c) posterior mean latent positions on the $\hat {p}_m = 2$ effective latent dimensions colored by inferred cluster label.

Figure 8

Figure 9. Heat map of the posterior similarity matrix of the cluster labels inferred from the football network, ordered by the cluster labels.

Figure 9

Figure 10. For the Irish politicians network: (a) the posterior distribution of the number of non-empty components $G_+$, and (b) the distribution of the effective latent space dimension $\hat {p}$.

Figure 10

Table 1. Cross-tabulation of political party membership and the LSPCM representative cluster labels

Figure 11

Figure 11. Fruchterman-Reingold layout of the Irish politicians network with nodes colored by (a) political party affiliation and (b) LSPCM inferred cluster labels.

Figure 12

Figure 12. Heat map of the posterior similarity matrix of the Irish politicians ordered by the cluster labels.

Figure 13

Figure 13. LSPCM inferred posterior mean latent positions of the Irish politicians on the $\hat {p}_m = 4$ dimensions with nodes colored by cluster membership.

Figure 14

Table 2. Hyperparameter specifications for the LSPCM model

Figure 15

Figure 14. For the simulated network with $p^*=2$ and $G^*=3$, the posterior distribution of (a) the shrinkage strength parameter, (b) the latent position variance parameter, and (c) the $\alpha$ parameter.

Figure 16

Figure 15. For the simulated network with $p^*=3$ and $G^*=7$, the posterior distribution of (a) the shrinkage strength parameter, (b) the latent position variance parameter, and (c) the $\alpha$ parameter.

Figure 17

Figure 16. For the football Twitter network: an example of a trace plot for the parameters (a) $\alpha$, (b) $\nu$, (c) $\delta _1$, and (d) $\delta _2$.

Figure 18

Figure 17. For the football Twitter network: an example of a trace plot for the parameters (a) $\psi _1$, (b) $\psi _2$ and (c) $\psi _3$.

Figure 19

Figure 18. For the football Twitter network, (a) the posterior distributions of the shrinkage strength parameters across dimensions, (b) the posterior distributions of the variance parameters across dimensions, and (c) the posterior distribution of $\alpha$.

Figure 20

Figure 19. For the Irish politicians Twitter network: an example of a trace plot for the parameters (a) $\alpha$, (b) $\nu$, (c) $\delta _1$, (d) $\delta _2$, (e) $\delta _3$, and (f) $\delta _4$.

Figure 21

Figure 20. For the Irish politicians’ Twitter network: an example of a trace plot for the parameters (a) $\psi _1$, (b) $\psi _2$, (c) $\psi _3$, (d) $\psi _4$ and (e) $\psi _5$.

Figure 22

Figure 21. Irish politicians Twitter network, (a) the posterior distribution of the shrinkage strength parameters across dimensions, (b) the posterior distributions of the variance parameters across dimensions, and (c) the posterior distribution of $\alpha$.

Figure 23

Figure 22. For the Irish politicians network with $a_{\nu }=5, b_{\nu }=5G$, where $G=20$: (a) the posterior distribution of the number of non-empty components $G_+$, and (b) the distribution of the effective latent space dimension $\hat {p}$.

Figure 24

Figure 23. For the Irish politicians network with $a_{\nu }=5, b_{\nu }=10G$, where $G=10$: (a) the posterior distribution of the number of non-empty components $G_+$, and (b) the distribution of the effective latent space dimension $\hat {p}$.

Figure 25

Table 3. For the Irish politicians network with $a_{\nu }=5, b_{\nu }=5G$, where $G=20$: cross-tabulation of political party membership and the LSPCM representative cluster labels

Figure 26

Table 4. For the Irish politicians network with $a_{\nu }=5, b_{\nu }=10G$, where $G=10$: Cross-tabulation of political party membership and the LSPCM representative cluster labels