No CrossRef data available.
Article contents
Average Jaccard index of random graphs
Published online by Cambridge University Press: 26 February 2024
Abstract
The asymptotic behavior of the Jaccard index in G(n, p), the classical Erdös–Rényi random graph model, is studied as n goes to infinity. We first derive the asymptotic distribution of the Jaccard index of any pair of distinct vertices, as well as the first two moments of this index. Then the average of the Jaccard indices over all vertex pairs in G(n, p) is shown to be asymptotically normal under an additional mild condition that $np\to\infty$ and $n^2(1-p)\to\infty$.
MSC classification
Secondary:
60F05: Central limit and other weak theorems
- Type
- Original Article
- Information
- Copyright
- © The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust
References
Ali, M. et al. (2021). Machine learning – a novel approach of well logs similarity based on synchronization measures to predict shear sonic logs. J. Petroleum Sci, Eng. 203, 108602.CrossRefGoogle Scholar
Arias-Castro, E. and Verzelen, N. (2014). Community detection in dense random networks. Ann. Statist. 42, 940–969.CrossRefGoogle Scholar
Bag, S., Kumar, S. K. and Tiwari, M. K. (2019). An efficient recommendation generation using relevant Jaccard similarity. Inf. Sci. 483, 53–64.CrossRefGoogle Scholar
Berahmand, K., Bouyer, A. and Vasighi, M. (2018). Community detection in complex networks by detecting and expanding core nodes through extended local similarity of nodes. IEEE Trans. Comput. Soc. Syst. 5, 1021–1033.CrossRefGoogle Scholar
Chung, N. C., Miasojedow, B., Startek, M. and Gambin, A. (2019). Jaccard/Tanimoto similarity test and estimation methods for biological presence–absence data. BMC Bioinform. 20, 1–11.CrossRefGoogle ScholarPubMed
da Fontoura Costa, L. (2021). Further generalizations of the Jaccard index. Preprint, arXiv:2110.09619.Google Scholar
Eelbode, T. et al. (2020). Optimization for medical image segmentation: Theory and practice when evaluating with dice score or Jaccard index. IEEE Trans. Med. Imag. 39, 3679–3690.CrossRefGoogle ScholarPubMed
Fan, X. et al. (2019). Similarity and heterogeneity of price dynamics across China’s regional carbon markets: A visibility graph network approach. Appl. Energy 235, 739–746.CrossRefGoogle Scholar
Feng, Q., Hu, Z. and Su, C. (2013). The Zagreb indices of random graphs. Prob. Eng. Inf. Sci. 27, 247–260.CrossRefGoogle Scholar
Hennig, C. (2007). Cluster-wise assessment of cluster stability. Comput. Statist. Data Anal. 52, 258–271.CrossRefGoogle Scholar
Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 241–272.Google Scholar
Janson, S., Luczak, T. and Rucinski, A. (2000). Random Graphs. John Wiley, New York.CrossRefGoogle Scholar
Koeneman, S. H. and Cavanaugh, J. E. (2022). An improved asymptotic test for the Jaccard similarity index for binary data. Statist. Prob. Lett. 184, 109375.CrossRefGoogle Scholar
Kogge, P. M. (2016). Jaccard coefficients as a potential graph benchmark. In Proc. 2016 IEEE Int. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 921–928. IEEE, Piscataway, NJ.CrossRefGoogle Scholar
Kosub, S. (2019). A note on the triangle inequality for the Jaccard distance. Pattern Recognition Lett. 120, 36–38.CrossRefGoogle Scholar
Lu, H. and Uddin, S. (2023). Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf. Sci. Syst. 11, 2.CrossRefGoogle ScholarPubMed
Mammone, N. et al. (2018). Permutation Jaccard distance-based hierarchical clustering to estimate EEG network density modifications in MCI subjects. IEEE Trans. Neural Netw. Learn. Syst. 29, 5122–5135.CrossRefGoogle Scholar
Miasnikof, P., Shestopaloff, A. Y., Pitsoulis, L. and Ponomarenko, A. (2022). An empirical comparison of connectivity-based distances on a graph and their computational scalability. J. Complex Netw. 10, cnac003.CrossRefGoogle Scholar
Sathre, P., Gondhalekar, A. and Feng, W. C. (2022). Edge-connected Jaccard similarity for graph link prediction on FPGA. In Proc. 2022 IEEE High Performance Extreme Computing Conf. (HPEC), pp. 1–10. IEEE, Piscataway, NY.CrossRefGoogle Scholar
Shestopaloff, P. M., Alexander, Y., Bravo, C. and Lawryshyn, Y. (2023). Statistical network isomorphism. In Complex Networks and Their Applications XI, eds H. Cherifi, R. N. Mantegna, L. M. Rocha, C. Cherifi, and S. Micciche. Springer, New York, pp. 325–336.Google Scholar
Shi, X., Wu, Y. and Liu, Y. (2010). A note on asymptotic approximations of inverse moments of nonnegative random variables. Statist. Prob. Lett. 80, 1260–1264.CrossRefGoogle Scholar
Singh, M. D., Krishna, P. R. and Saxena, A. (2009). A privacy preserving Jaccard similarity function for mining encrypted data. In Proc. TENCON 2009–2009 IEEE Region 10 Conf., pp. 1–4.CrossRefGoogle Scholar
van der Hofstad, R. (2016). Random Graphs and Complex Networks. Cambridge University Press.CrossRefGoogle Scholar
Verzelen, N. and Arias-Castro, E. (2015). Community detection in sparse random networks. Ann. Appl. Prob. 25, 3465–3510.CrossRefGoogle Scholar
Wu, C. and Wang, B. (2017). Extracting topics based on word2vec and improved Jaccard similarity coefficient. In Proc. 2017 IEEE 2nd Int. Conf. Data Science in Cyberspace (DSC), pp. 389–397.CrossRefGoogle Scholar
Wuyungaowa, and Wang, T., (2008). Asymptotic expansions for inverse moments of binomial and negative binomial. Statist. Prob. Lett. 78, 3018–3022.CrossRefGoogle Scholar
Yin, Y. and Yasuda, K. (2006). Similarity coefficient methods applied to the cell formation problem: A taxonomy and review. Int. J. Prod. Econ. 101, 329–352.CrossRefGoogle Scholar
Zhang, P. et al. (2016). Measuring the robustness of link prediction algorithms under noisy environment. Sci. Rep. 6, 18881.CrossRefGoogle ScholarPubMed