Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-11T03:24:16.853Z Has data issue: false hasContentIssue false

An MBO method for modularity optimisation based on total variation and signless total variation

Published online by Cambridge University Press:  25 November 2024

Zijun Li
Affiliation:
Department of Mathematics, Humboldt-Universität zu Berlin, Berlin, Germany
Yves van Gennip*
Affiliation:
Delft Institute of Applied Mathematics, Delft University of Technology, Delft, Netherlands
Volker John
Affiliation:
Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
*
Corresponding author: van Gennip Yves; Email: y.vangennip@tudelft.nl
Rights & Permissions [Opens in a new window]

Abstract

In network science, one of the significant and challenging subjects is the detection of communities. Modularity [1] is a measure of community structure that compares connectivity in the network with the expected connectivity in a graph sampled from a random null model. Its optimisation is a common approach to tackle the community detection problem. We present a new method for modularity maximisation, which is based on the observation that modularity can be expressed in terms of total variation on the graph and signless total variation on the null model. The resulting algorithm is of Merriman–Bence–Osher (MBO) type. Different from earlier methods of this type, the new method can easily accommodate different choices of the null model. Besides theoretical investigations of the method, we include in this paper numerical comparisons with other community detection methods, among which the MBO-type methods of Hu et al. [2] and Boyd et al. [3], and the Leiden algorithm [4].

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Summary of frequently used symbols

Figure 1

Algorithm 1. The MMBO scheme using the closed-form solution of the linear-dynamics step

Figure 2

Algorithm 2. The MMBO scheme using the Euler finite-difference discretisation

Figure 3

Table 2. MNIST: parameter settings for the Nyström extension and edge weights in (59) (left) and parameter setting of the MMBO scheme (right)

Figure 4

Table 3. MNIST: average time per run for computing eigenvalues and eigenvectors, the average time per run for all MBO iterations, and the average number of MBO iterations per run for the MMBO schemes, Hu et al.’s method, and Boyd et al.’s method when using $m=130$ and $K=764$ and the partition-based stopping criterion from (48). The number of iterations is rounded to the nearest integer. The best average result in each column is shown in boldface

Figure 5

Table 4. MNIST: average time per run for computing eigenvalues and eigenvectors, the average time per run for all MBO iterations, and the average number of MBO iterations per run for the MMBO scheme, Hu et al.’s method, and Boyd et al.’s method when using $m=130$ and $K=764$ and the modularity-based stopping condition from (49). The number of iterations is rounded to the nearest integer. The best average result in each column is shown in boldface

Figure 6

Table 5. MNIST: average performance of algorithms regarding modularity scores, various classification metrics, and average computation time per run under NG null model. The best average results in each column are shown in boldface (we exclude the ground truth numbers). For the number of non-empty clusters we consider the one closest to the ground truth to be ‘best’ in this context

Figure 7

Table 6. MNIST: average performance of different algorithms regarding modularity scores, various classification metrics, and total computation time under NG null model when using $m=130$ and $K=764$ and the partition-based stopping criterion (48). The best average result in each column is shown in boldface. For the number of non-empty clusters we consider the one closest to the ground truth number $10$ to be ‘best’ in this context

Figure 8

Table 7. MNIST: average performance of algorithms regarding modularity scores, various classification metrics, and total computation time under NG null model with (‘$10\%$’) and without (‘no’) $10\%$ mild semi-supervision when using the modularity-based stopping condition (49). In both the unsupervised and mildly semi-supervised case, $m=130$ and $K = 764$ are used. With mild semi-supervised clustering, $m=130$ and $K=764$ is used. The best average results with and without mild semi-supervision in each column are shown in boldface. For the number of non-empty clusters we consider the one closest to the ground truth number $10$ to be ‘best’ in this context

Figure 9

Figure 1. MNIST: comparison of the spectra of different operators with $\gamma =1$ under the NG null model. In each of the plots, one of the two curves is hidden behind the other one.

Figure 10

Figure 2. MNIST: relationship between the number of eigenvalues used and modularity. The MMBO Algorithm1 uses the modularity-based stopping condition (49) and $\gamma =1$.

Figure 11

Figure 3. MNIST: Modularity score versus number of iterations, obtained with $\gamma =1$ without stopping criterion.

Figure 12

Table 8. Parameter settings used to construct the SBM

Figure 13

Figure 4. SBM: Adjacency matrices of realisations of the strong and weak community structure where the number of blocks is $10$.

Figure 14

Figure 5. SBM with strong and weak community structure: spectra of $L_{\textrm{Hu},\textrm{sym}}$, $L_{\textrm{Hu},\textrm{rw}}$, $L_{\textrm{Boyd},\textrm{sym}}$, $L_{\textrm{Boyd},\textrm{rw}}$ and four choices of $L_{\textrm{mix}}\in \{L_{W_{\textrm{sym}}}+ \gamma Q_{P_{\textrm{sym}}}, L_{W_{\textrm{rw}}}+\gamma Q_{P_{\textrm{rw}}}, L_{{B^+_\gamma }_{\textrm{sym}}}+Q_{{B^-_\gamma }_{\textrm{sym}}}, L_{{B^+_\gamma }_{\textrm{rw}}}+Q_{{B^-_\gamma }_{\textrm{rw}}}-D_{B^+_\gamma }^{-1} D_{B_\gamma } Q_{{B^-_\gamma }_{\textrm{rw}}} \}$ with $\gamma =1$ and the NG null model, for a single realisation of an SBM with $10$ blocks. The following graphs overlap: $L_{\textrm{Hu},\textrm{sym}}$ and $L_{\textrm{Hu},\textrm{rw}}$; $L_{\textrm{Boyd},\textrm{sym}}$ and $L_{\textrm{Boyd},\textrm{rw}}$; $L_{W_{\textrm{sym}}}+\gamma Q_{P_{\textrm{sym}}}$ and $L_{W_{\textrm{rw}}}+\gamma Q_{P_{\textrm{rw}}}$ (which is expected thanks to Remark5.2); $L_{{B^+_1}_{\textrm{sym}}}+Q_{{B^-_1}_{\textrm{sym}}}$ and (using that $D_{B_1}=0$ by (20)) $L_{{B^+_1}_{\textrm{rw}}}+Q_{{B^-_1}_{\textrm{rw}}}-D_{B^+_1}^{-1} D_{B_1} Q_{{B^-_1}_{\textrm{rw}}}=L_{{B^+_1}_{\textrm{rw}}}+Q_{{B^-_1}_{\textrm{rw}}}$ (which is expected from Remark5.3).

Figure 15

Table 9. Parameter setting of the MMBO schemes, Hu et al.’s and Boyd et al.’s methods in SBM

Figure 16

Figure 6. SBM with strong and weak community structures: modularity depending on the number of eigenvalues used ($m$) for SBM blocks are $10$. The number of clusters $K$ used by the MMBO schemes, Hu et al.’s and Boyd et al.’s methods are obtained from Leiden algorithm, that is, $K=10$ for both the strong community structure and the weak community structure. All methods use $\gamma =1$, the partitioned-based stopping condition (48) and the NG null model. The red circle solid curve and purple triangle solid curve are overlapped by the brown diamond dashed curve and pink octagon dashed curve, respectively.

Figure 17

Table 10. SBM: average NG modularity, other classification metrics scores, and average computation time per run obtained from $20$ runs. The best average results for the strong and for the weak community structure in each column are shown in boldface. For the number of non-empty clusters we consider the one closest to the ground truth number $10$ to be ‘best’ in this context

Figure 18

Table 11. SBM with strong community structure: average performance of algorithms regarding modularity scores, various classification indicators, average time per run, and average number of iterations per run. The number of clusters $K$ used by spectral clustering, MMBO schemes, Hu et al.’s, and Boyd et al.’s methods are obtained from the Leiden algorithm, that is, $K = 10$. Moreover, for the MMBO schemes, Hu et al.’s method and Boyd et al.’s method, we choose $m =12$. The best average results in each column are shown in boldface (we exclude the ground truth numbers). For the number of non-empty clusters we consider the one closest to the ground truth number to be ‘best’ in this context

Figure 19

Table 12. SBM with weak community structure: average performance of algorithms regarding modularity scores, various classification indicators, average time per run, and average number of iterations per run. The number of clusters $K$ used by spectral clustering, MMBO schemes, Hu et al.’s, and Boyd et al.’s methods are obtained from the Leiden algorithm, that is, $K = 10$. Moreover, for the MMBO schemes, Hu et al.’s method and Boyd et al.’s method, we choose $m =10$. The best average results in each column are shown in boldface (we exclude the ground truth numbers). For the number of non-empty clusters we consider the one closest to the ground truth number to be ‘best’ in this context

Figure 20

Table 13. Two cows: parameter settings for the Nyström extension and edge weights in (59) (left) and parameter setting of the MMBO schemes (right)

Figure 21

Table 14. Two cows: average performance of algorithms regarding modularity scores, various classification metrics, and computation time per run under NG null model. The best average result in each column is shown in boldface (we exclude the ground truth numbers). For the number of non-empty clusters we consider the one closest to the ground truth number $3$ to be ‘best’ in this context

Figure 22

Table 15. Two cows: average performance of algorithms under the NG null model regarding modularity scores, various classification metrics, and computation time per run under the NG model. In all cases, $K=168$ is applied to spectral clustering, MMBO schemes, Hu et al.’s method, and Boyd et al.’s method. Note that for the MMBO schemes, Hu et al.’s and Boyd et al.’s methods, we choose $m = K=168$ and use modularity-based stopping condition (49). The best average results in each column are shown in boldface. For the number of non-empty clusters we consider the one closest to the ground truth number $3$ to be ‘best’ in this context

Figure 23

Table 16. Two cows: average performance of algorithms regarding modularity scores, various classification metrics, and computation time per run under the NG model. In all cases, $K=3$ is applied to spectral clustering, MMBO schemes, Hu et al.’s method, and Boyd et al.’s method. Note that for the MMBO schemes, Hu et al.’s and Boyd et al.’s methods, we choose $m =K=3$ and use modularity-based stopping condition (49). The best average results in each column are shown in boldface. For the number of non-empty clusters we consider the one closest to the ground truth number $3$ to be ‘best’ in this context

Figure 24

Figure 7. The ‘two cows’ image segmented using different methods with $\gamma =1$. The number of clusters $K$ used by MMBO algorithms, Hu et al.’s method and Boyd et al.’s method is obtained from Louvain’s method, that is, $K = 168$. Moreover, for the MMBO schemes, Hu et al.’s method and Boyd et al.’s method, we choose $m = K=168$. Each method’s displayed image segmentation result is the one with the highest modularity scores for that method from among $20$ runs.

Figure 25

Figure 8. The ‘two cows’ image is segmented using different methods with $\gamma =1$. The number of clusters $K$ used by MMBO algorithms, Hu et al.’s method and Boyd et al.’s method is obtained from the ground truth (shown in Figure 7), that is, $K = 3$. Moreover, for the MMBO scheme, Hu et al.’s method and Boyd et al.’s method, we choose $m = K=3$. Each method’s displayed image segmentation result is the one with the highest modularity scores for that method from among $20$ runs.

Figure 26

Figure 9. SBM with strong and weak community structure (see Section 7.3 for details): spectra of $L_{W_{\textrm{sym}}}$ and $L_{W_{\textrm{rw}}}$. As expected [74], both operators have the same eigenvalues.