Descriptive vs. Inferential Community Detection in Networks: Pitfalls, Myths and Half-Truths

Tiago P. Peixoto

doi:10.1017/9781009118897

References

[1]Fortunato, Santo, “Community detection in graphs,” Physics Reports 486, 75–174 (2010).

[2]Fortunato, Santo and Hric, Darko, “Community detection in networks: A user guide,” Physics Reports (2016).

[3]Moore, Cristopher, “The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness,” arXiv:1702.00467 (2017).

[4]Abbe, Emmanuel, “Community detection and stochastic block models: recent developments,” arXiv:1703.10146 [cs, math, stat] (2017).

[5]Peixoto, Tiago P., “Bayesian Stochastic Blockmodeling,” in Advances in Network Clustering and Blockmodeling, edited by Doreian, P., Batagelj, V., and Ferligoj, A. (John Wiley & Sons, Ltd, 2019) pp. 289–332.

[6]Decelle, Aurelien, Krzakala, Florent, Moore, Cristopher, and Zdeborová, Lenka, “Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications,” Physical Review E 84, 066106 (2011).

[7]Lenka Zdeborová and Krzakala, Florent, “Statistical physics of inference: thresholds and algorithms,” Advances in Physics 65, 453–552 (2016).

[8]Schaub, Michael T., Delvenne, Jean-Charles, Rosvall, Martin, and Lambiotte, Renaud, “The many facets of community detection in complex networks,” Applied Network Science 2, 1–13 (2017).

[9]Peixoto, Tiago P., “The graph-tool python library,” figshare (2014), 10.6084/m9.figshare.1164194, available at https://graph-tool.skewed.de.

[10]Jacob Baker, R., CMOS: Circuit Design, Layout, and Simulation, 3rd ed. (Wiley-IEEE Press, Piscataway, NJ: Hoboken, NJ, 2010).

[11]Kernighan, Brian Wilson, Some graph partitioning problems related to program segmentation (Princeton University, 1969).

[12]Kernighan, B.W. and Lin, S., “An efficient heuristic procedure for partitioning graphs,” Bell System Technical Journal 49, 291–307 (1970).

[13]Bichot, Charles-Edmond and Siarry, Patrick, Graph partitioning (John Wiley & Sons, 2013).

[14]Holland, Paul W., Laskey, Kathryn Blackmond, and Leinhardt, Samuel, “Stochastic blockmodels: First steps,” Social Networks 5, 109–137 (1983).

[15]Karrer, Brian and Newman, M. E. J., “Stochastic blockmodels and community structure in networks,” Physical Review E 83, 016107 (2011).

[16]Peixoto, Tiago P., “Nonparametric Bayesian inference of the microcanonical stochastic block model,” Physical Review E 95, 012317 (2017).

[17]Rissanen, J., “Modeling by shortest data description,” Automatica 14, 465–471 (1978).

[18]Grünwald, Peter D., The Minimum Description Length Principle (The MIT Press, 2007).

[19]Rissanen, Jorma, Information and Complexity in Statistical Modeling, 1st ed. (Springer, 2010).

[20]MacKay, David J. C., Information Theory, Inference and Learning Algorithms, first edition ed. (Cambridge University Press, 2003).

[21]Shannon, C. E, “A mathematical theory of communication,” Bell Syst Tech. J 27, 623 (1948).

[22]Zitnik, Marinka, Sosič, Rok, Feldman, Marcus W., and Leskovec, Jure, “Evolution of resilience in protein interactomes across the tree of life,” Proceedings of the National Academy of Sciences 116, 4426–4433 (2019), publisher: National Academy of Sciences Section: PNAS Plus.

[23]Peixoto, Tiago P., “Disentangling Homophily, Community Structure, and Triadic Closure in Networks,” Physical Review X 12, 011004 (2022).

[24]Zhang, Lizhi and Peixoto, Tiago P., “Statistical inference of assortative community structures,” Physical Review Research 2, 043271 (2020).

[25]Pastor-Satorras, Romualdo, Smith, Eric, and Solé, Ricard V., “Evolving protein interaction networks through gene duplication,” Journal of Theoretical Biology 222, 199–210 (2003).

[26]Peixoto, Tiago P., “Revealing Consensus and Dissensus between Network Partitions,” Physical Review X 11, 021003 (2021).

[27]Yan, Xiaoran, Shalizi, Cosma, Jensen, Jacob E., Krzakala, Florent, Moore, Cristopher, Zdeborová, Lenka, Zhang, Pan, and Zhu, Yaojia, “Model selection for degree-corrected block models,” Journal of Statistical Mechanics: Theory and Experiment 2014, P05007 (2014).

[28]Decelle, Aurelien, Krzakala, Florent, Moore, Cristopher, and Zdeborová, Lenka, “Phase transition in the detection of modules in sparse networks,” 1102.1182 (2011).

[29]Cover, Thomas M. and Thomas, Joy A., Elements of Information Theory, 99th ed. (Wiley-Interscience, 1991).

[30]Ming, Li and Vitányi, Paul M. B., An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed. (Springer, New York, 2008).

[31]Snijders, Tom A. B. and Nowicki, Krzysztof, “Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure,” Journal of Classification 14, 75–100 (1997).

[32]Nowicki, Krzysztof and Tom, A. Snijders, B, “Estimation and Prediction for Stochastic Blockstructures,” Journal of the American Statistical Association 96, 1077–1087 (2001).

[33]Tallberg, Christian, “A Bayesian Approach to Modeling Stochastic Blockstructures with Covariates,” The Journal of Mathematical Sociology 29, 1–23 (2004).

[34]Hastings, M. B., “Community detection as an inference problem,” Physical Review E 74, 035102 (2006).

[35]Rosvall, Martin and Bergstrom, Carl T., “An information-theoretic framework for resolving community structure in complex networks,” Proceedings of the National Academy of Sciences 104, 7327–7331 (2007).

[36]Airoldi, Edoardo M., Blei, David M., Fienberg, Stephen E., and Xing, Eric P., “Mixed Membership Stochastic Blockmodels,” J. Mach. Learn. Res. 9, 1981–2014 (2008).

[37]Clauset, Aaron, Moore, Cristopher, and Newman, M. E. J., “Hierarchical structure and the prediction of missing links in networks,” Nature 453, 98–101 (2008).

[38]Hofman, Jake M. and Wiggins, Chris H., “Bayesian Approach to Network Modularity,” Physical Review Letters 100, 258701 (2008).

[39]Mørup, Morten and Hansen, Lars Kai, “Learning latent structure in complex networks,” in NIPS Workshop on Analyzing Networks and Learning with Graphs (2009).

[40]Boguñá, Marián and Pastor-Satorras, Romualdo, “Class of correlated random networks with hidden variables,” Physical Review E 68, 036112 (2003).

[41]Bollobás, Béla, Janson, Svante, and Riordan, Oliver, “The phase transition in inhomogeneous random graphs,” Random Structures & Algorithms 31, 3–122 (2007).

[42]Lancichinetti, Andrea, Fortunato, Santo, and Radicchi, Filippo, “Benchmark graphs for testing community detection algorithms,” Physical Review E 78, 046110 (2008).

[43]Girvan, M. and Newman, M. E. J., “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences 99, 7821–7826 (2002).

[44]Lancichinetti, Andrea and Fortunato, Santo, “Community detection algorithms: A comparative analysis,” Physical Review E 80, 056117 (2009).

[45]Decelle, Aurelien, Krzakala, Florent, Moore, Cristopher, and Zdeborová, Lenka, “Inference and Phase Transitions in the Detection of Modules in Sparse Networks,” Physical Review Letters 107, 065701 (2011).

[46]Newman, M. E. J., “Modularity and community structure in networks,” Proceedings of the National Academy of Sciences 103, 8577–8582 (2006).

[47]Rosvall, Martin and Bergstrom, Carl T., “Maps of random walks on complex networks reveal community structure,” Proceedings of the National Academy of Sciences 105, 1118–1123 (2008).

[48]Lambiotte, R., Delvenne, J. C., and Barahona, M., “Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks,” IEEE Transactions on Network Science and Engineering 1, 76–90 (2014).

[49]Gelman, Andrew, Carlin, John B., Stern, Hal S., Dunson, David B., Vehtari, Aki, and Rubin, Donald B., Bayesian Data Analysis, 3rd ed. (Chapman and Hall/CRC, Boca Raton, 2013).

[50]Bishop, Christopher M., Pattern Recognition and Machine Learning (Springer, 2011).

[51]Newman, M. E. J., “Network structure from rich but noisy data,” Nature Physics 14, 542–545 (2018).

[52]Martin, Travis, Ball, Brian, and Newman, M. E. J., “Structural inference for uncertain networks,” Physical Review E 93, 012306 (2016).

[53]Peixoto, Tiago P., “Reconstructing Networks with Unknown and Heterogeneous Errors,” Physical Review X 8, 041011 (2018).

[54]Guimerà, Roger and Sales-Pardo, Marta, “Missing and spurious interactions and the reconstruction of complex networks,” Proceedings of the National Academy of Sciences 106, 22073–22078 (2009).

[55]Hoffmann, Till, Peel, Leto, Lambiotte, Renaud, and Jones, Nick S., “Community detection in networks without observing edges,” Science Advances 6, eaav1478 (2020), publisher: American Association for the Advancement of Science Section: Research Article.

[56]Peixoto, Tiago P., “Network Reconstruction and Community Detection from Dynamics,” Physical Review Letters 123, 128301 (2019).

[57]Fosdick, B., Larremore, D., Nishimura, J., and Ugander, J., “Configuring Random Graph Models with Fixed Degree Sequences,” SIAM Review 60, 315–355 (2018).

[58]Chung, Fan and Linyuan, Lu, “Connected Components in Random Graphs with Given Expected Degree Sequences,” Annals of Combinatorics 6, 125–145 (2002).

[59]Guimerà, Roger, Sales-Pardo, Marta, and Nunes Amaral, Luís A, “Modularity from fluctuations in random graphs and complex networks,” Physical Review E 70, 025101 (2004).

[60]Fortunato, Santo and Barthélemy, Marc, “Resolution limit in community detection,” Proceedings of the National Academy of Sciences 104, 36–41 (2007).

[61]Good, Benjamin H., Yves-Alexandre, de Montjoye, and Clauset, Aaron, “Performance of modularity maximization in practical contexts,” Physical Review E 81, 046106 (2010).

[62]Newman, M. E. J., “Mixing patterns in networks,” Phys. Rev. E 67, 026126 (2003).

[63]Riolo, Maria A. and Newman, M. E. J., “Consistency of community structure in complex networks,” Physical Review E 101, 052306 (2020).

[64]Zhang, Lizhi and Peixoto, T. P., “Large-scale assessment of overfitting, underfitting and model selection for modular network structures,” in preparation.

[65]Peixoto, T. P., “The Netzschleuder network catalogue and repository.” (2020), accessible at https://networks.skewed.de.

[66]Ghasemian, Amir, Hosseinmardi, Homa, and Clauset, Aaron, “Evaluating Overfit and Underfit in Models of Network Community Structure,” IEEE Transactions on Knowledge and Data Engineering, 1–1 (2019).

[67]Peixoto, Tiago P., “Hierarchical Block Structures and High-Resolution Model Selection in Large Networks,” Physical Review X 4, 011047 (2014).

[68]Larremore, Daniel B., Clauset, Aaron, and Jacobs, Abigail Z., “Efficiently inferring community structure in bipartite networks,” Physical Review E 90, 012805 (2014).

[69]Zhang, Xiao, Martin, Travis, and Newman, M. E. J., “Identification of core-periphery structure in networks,” Physical Review E 91, 032803 (2015).

[70]Zhang, Pan and Moore, Cristopher, “Scalable detection of statistically significant communities and hierarchies, using message passing for modularity,” Proceedings of the National Academy of Sciences 111, 18144–18149 (2014), publisher: National Academy of Sciences Section: Physical Sciences.

[71]Newman, M. E. J., “Equivalence between modularity optimization and maximum likelihood methods for community detection,” Physical Review E 94 (2016), 10.1103/PhysRevE.94.052315.

[72]Reichardt, Jörg and Bornholdt, Stefan, “Statistical mechanics of community detection,” Physical Review E 74, 016110 (2006).

[73]Arenas, A., Fernández, A., and Gómez, S., “Analysis of the structure of complex networks at different resolution levels,” New Journal of Physics 10, 053039 (2008).

[74]Bickel, Peter J. and Chen, Aiyou, “A nonparametric view of network models and Newman–Girvan and other modularities,” Proceedings of the National Academy of Sciences 106, 21068–21073 (2009).

[75]Newman, M. E. J., “Spectral methods for community detection and graph partitioning,” Physical Review E 88, 042822 (2013).

[76]Massen, Claire P. and Doye, Jonathan P. K., “Thermodynamics of Community Structure,” arXiv: cond-mat/0610077 (2006).

[77]Lancichinetti, Andrea and Fortunato, Santo, “Consensus clustering in complex networks,” Scientific Reports 2, 1–7 (2012), number: 1 Publisher: Nature Publishing Group.

[78]Reichardt, Jörg and Bornholdt, Stefan, “When are networks truly modular?” Physica D: Nonlinear Phenomena 224, 20–26 (2006).

[79]Dandan, Hu, Ronhovde, Peter, and Nussinov, Zohar, “Phase transitions in random Potts systems and the community detection problem: spin-glass type and dynamic perspectives,” Philosophical Magazine 92, 406–445 (2012).

[80]Kirkley, Alec and Newman, M. E. J., “Representative community divisions of networks,” Communications Physics 5, 1–10 (2022), number: 1 Publisher: Nature Publishing Group.

[81]Foster, David V., Foster, Jacob G., Grassberger, Peter, and Paczuski, Maya, “Clustering drives assortativity and community structure in ensembles of networks,” Physical Review E 84, 066117 (2011).

[82]Lancichinetti, Andrea and Fortunato, Santo, “Limits of modularity maximization in community detection,” Physical Review E 84, 066122 (2011).

[83]Granell, Clara, Gómez, Sergio, and Arenas, Alex, “Hierarchical multiresolution method to overcome the resolution limit in complex networks,” International Journal of Bifurcation and Chaos 22, 1250171 (2012).

[84]Kawamoto, Tatsuro and Rosvall, Martin, “Estimating the resolution limit of the map equation in community detection,” Physical Review E 91, 012809 (2015).

[85]Peixoto, Tiago P., “Parsimonious Module Inference in Large Networks,” Physical Review Letters 110, 148701 (2013).

[86]Barber, Michael J, “Modularity and community detection in bipartite networks,” 0707.1616 (2007).

[87]MacMahon, Mel and Garlaschelli, Diego, “Community Detection for Correlation Matrices,” Physical Review X 5, 021006 (2015).

[88]Traag, V. A. and Bruggeman, Jeroen, “Community detection in networks with positive and negative links,” Physical Review E 80, 036115 (2009).

[89]Expert, Paul, Evans, Tim S., Blondel, Vincent D., and Lambiotte, Renaud, “Uncovering space-independent communities in spatial networks,” Proceedings of the National Academy of Sciences 108, 7663–7668 (2011), publisher: National Academy of Sciences Section: Physical Sciences.

[90]Hric, Darko, Peixoto, Tiago P., and Fortunato, Santo, “Network Structure, Metadata, and the Prediction of Missing Nodes and Annotations,” Physical Review X 6, 031038 (2016).

[91]Newman, M. E. J. and Clauset, Aaron, “Structure and inference in annotated networks,” Nature Communications 7, 11863 (2016).

[92]Peel, Leto, Larremore, Daniel B., and Clauset, Aaron, “The ground truth about metadata and community detection in networks,” Science Advances 3, e1602548 (2017).

[93]Hu, Y., “Efficient, high-quality force-directed graph drawing,” Mathematica Journal 10, 37–71 (2005).

[94]Noack, Andreas, “Modularity clustering is force-directed layout,” Physical Review E 79, 026102 (2009).

[95]Wolpert, David H. and Macready, William G., No free lunch theorems for search, Tech. Rep. (Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1995).

[96]Wolpert, David H., “The Lack of A Priori Distinctions Between Learning Algorithms,” Neural Computation 8, 1341–1390 (1996).

[97]Wolpert, David H. and Macready, William G., “No free lunch theorems for optimization,” IEEE transactions on evolutionary computation 1, 67–82 (1997).

[98]Schaffer, Cullen, “A Conservation Law for Generalization Performance,” in Machine Learning Proceedings 1994, edited by Cohen, William W. and Hirsh, Haym (Morgan Kaufmann, San Francisco (CA), 1994) pp. 259–265.

[99]Streeter, Matthew J., “Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold,” in Genetic and Evolutionary Computation — GECCO 2003, Lecture Notes in Computer Science, edited by Erick Cantú-Paz, James A. Foster, Kalyanmoy Deb, Davis, Lawrence David, Roy, Rajkumar, O’Reilly, Una-May, Beyer, Hans-Georg, Standish, Russell, Kendall, Graham, Wilson, Stewart, Harman, Mark, Wegener, Joachim, Dipankar Dasgupta, Mitch A. Potter, Alan C. Schultz, Kathryn A. Dowsland, Natasha Jonoska, and Miller, Julian (Springer, Berlin, Heidelberg, 2003) pp. 1418–1430.

[100]McGregor, Simon, “No free lunch and algorithmic randomness,” in GECCO, Vol. 6 (2006) pp.2–4.

[101]Everitt, Tom, “ Universal induction and optimisation: No free lunch?” unpublished master’s thesis, Stockholms Universitet (2013).

[102]Lattimore, Tor and Hutter, Marcus, “No Free Lunch versus Occam’s Razor in Supervised Learning,” in Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence: Papers from the Ray Solomonoff 85th Memorial Conference, Melbourne, VIC, Australia, November 30 – December 2, 2011, Lecture Notes in Computer Science, edited by Dowe, David L. (Springer, Berlin, Heidelberg, 2013) pp. 223–235.

[103]Schurz, Gerhard, Hume’s Problem Solved: The Optimality of Meta-Induction, illustrated edition ed. (The MIT Press, Cambridge, Massachusetts, 2019).

[104]Jaynes, E. T., Probability Theory: The Logic of Science, edited by Larry Bretthorst, G. (Cambridge University Press, Cambridge, UK; New York, NY, 2003).

[105]Solomonoff, R. J., “A formal theory of inductive inference. Part I,” Information and Control 7, 1–22 (1964).

[106]Hutter, Marcus, “On universal prediction and Bayesian confirmation,” Theoretical Computer Science Theory and Applications of Models of Computation, 384, 33–48 (2007).

[107]Hutter, Marcus, “Open Problems in Universal Induction & Intelligence,” Algorithms 2, 879–906 (2009), number: 3 Publisher: Molecular Diversity Preservation International.

[108]Montanez, George D., “Why machine learning works,” unpublished Ph.D. thesis, Carnegie Mellon University, Pittsburgh (2017).

[109]Vallès-Català, Toni, Peixoto, Tiago P., Sales-Pardo, Marta, and Guimerà, Roger, “Consistencies and inconsistencies between model selection and link prediction in networks,” Physical Review E 97, 062316 (2018).

[110]Ghasemian, Amir, Hosseinmardi, Homa, Galstyan, Aram, Airoldi, Edoardo M., and Clauset, Aaron, “Stacking models for nearly optimal link prediction in complex networks,” Proceedings of the National Academy of Sciences 117, 23393–23400 (2020).

[111]Olhede, Sofia C. and Wolfe, Patrick J., “Network histograms and universality of blockmodel approximation,” Proceedings of the National Academy of Sciences 111, 14722–14727 (2014).

[112]Young, Jean-Gabriel, St-Onge, Guillaume, Desrosiers, Patrick, and Dubé, Louis J., “Universality of the stochastic block model,” Physical Review E 98, 032309 (2018).

[113]Hoff, Peter D, Raftery, Adrian E, and Handcock, Mark S, “Latent Space Approaches to Social Network Analysis,” Journal of the American Statistical Association 97, 1090–1098 (2002).

[114]Ziv, J. and Lempel, A., “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory 23, 337–343 (1977).

[115]Gelman, Andrew, Vehtari, Aki, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yao, Yuling, Kennedy, Lauren, Gabry, Jonah, Bürkner, Paul-Christian, and Modrák, Martin, “Bayesian Workflow,” (2020), arXiv:2011.01808.

[116]Blondel, Vincent D., Guillaume, Jean-Loup, Lambiotte, Renaud, and Lefebvre, Etienne, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008).

[117]Traag, V. A., Waltman, L., and van Eck, N. J., “From Louvain to Leiden: guaranteeing well-connected communities,” Scientific Reports 9, 5233 (2019).

[118]Peixoto, Tiago P., “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models,” Physical Review E 89, 012804 (2014).

[119]Peixoto, Tiago P., “Merge-split Markov chain Monte Carlo for community detection,” Physical Review E 102, 012305 (2020).

[120]Krzakala, Florent, Moore, Cristopher, Mossel, Elchanan, Neeman, Joe, Sly, Allan, Zdeborová, Lenka, and Zhang, Pan, “Spectral redemption in clustering sparse networks,” Proceedings of the National Academy of Sciences, 110, 20935–20940 (2013).

[121]Kawamoto, Tatsuro, “Algorithmic detectability threshold of the stochastic block model,” Physical Review E 97, 032301 (2018).

[122]Spielman, Daniel A. and Teng, Shang-Hua, “Spectral partitioning works: planar graphs and finite element meshes,” Linear Algebra and its Applications 421, 284–305 (2007).

[123]Ulrike, von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing 17, 395–416 (2007).

[124]Rohe, Karl, “Spectral clustering and the high-dimensional stochastic blockmodel,” The Annals of Statistics 39, 1878–1915 (2011).

[125]Lehoucq, R. B. and Sorensen, D. C., “Deflation Techniques for an Implicitly Restarted Arnoldi Iteration,” SIAM Journal on Matrix Analysis and Applications 17, 789–821 (1996).

[126]Fire, Michael, Puzis, Rami, and Elovici, Yuval, “Link Prediction in Highly Fractional Data Sets,” in Handbook of Computational Approaches to Counterterrorism, edited by Subrahmanian, V.S. (Springer, New York, NY, 2013) pp. 283–300.

[127]Lehoucq, Richard B., Sorensen, Danny C., and Yang, Chao, ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods (SIAM, 1998).

[128]Schwarz, Gideon, “Estimating the Dimension of a Model,” The Annals of Statistics 6, 461–464 (1978).

[129]Akaike, H., “A new look at the statistical model identification,” IEEE Transactions on Automatic Control 19, 716–723 (1974).

[130]Côme, Etienne and Latouche, Pierre, “Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood,” Statistical Modelling 15, 564–589 (2015).

[131]Newman, M. E. J. and Reinert, Gesine, “Estimating the Number of Communities in a Network,” Physical Review Letters 117, 078301 (2016).

[132]Burnham, Kenneth P. and Anderson, David R., eds., Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Springer, New York, NY, 2002).

Descriptive vs. Inferential Community Detection in Networks

Pitfalls, Myths and Half-Truths

This Element has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

References

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

Why this information is here

Accessibility Information