Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-28T15:50:07.841Z Has data issue: false hasContentIssue false

Average Jaccard index of random graphs

Published online by Cambridge University Press:  26 February 2024

Qunqiang Feng*
Affiliation:
University of Science and Technology of China
Shuai Guo*
Affiliation:
University of Science and Technology of China
Zhishui Hu*
Affiliation:
University of Science and Technology of China
*
*Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.
*Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.
*Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.

Abstract

The asymptotic behavior of the Jaccard index in G(n, p), the classical Erdös–Rényi random graph model, is studied as n goes to infinity. We first derive the asymptotic distribution of the Jaccard index of any pair of distinct vertices, as well as the first two moments of this index. Then the average of the Jaccard indices over all vertex pairs in G(n, p) is shown to be asymptotically normal under an additional mild condition that $np\to\infty$ and $n^2(1-p)\to\infty$.

Type
Original Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ali, M. et al. (2021). Machine learning – a novel approach of well logs similarity based on synchronization measures to predict shear sonic logs. J. Petroleum Sci, Eng. 203, 108602.CrossRefGoogle Scholar
Arias-Castro, E. and Verzelen, N. (2014). Community detection in dense random networks. Ann. Statist. 42, 940969.CrossRefGoogle Scholar
Bag, S., Kumar, S. K. and Tiwari, M. K. (2019). An efficient recommendation generation using relevant Jaccard similarity. Inf. Sci. 483, 5364.CrossRefGoogle Scholar
Berahmand, K., Bouyer, A. and Vasighi, M. (2018). Community detection in complex networks by detecting and expanding core nodes through extended local similarity of nodes. IEEE Trans. Comput. Soc. Syst. 5, 10211033.CrossRefGoogle Scholar
Bollobás, B. (2001). Random Graphs, 2nd edn. Cambridge University Press.CrossRefGoogle Scholar
Chung, N. C., Miasojedow, B., Startek, M. and Gambin, A. (2019). Jaccard/Tanimoto similarity test and estimation methods for biological presence–absence data. BMC Bioinform. 20, 111.CrossRefGoogle ScholarPubMed
da Fontoura Costa, L. (2021). Further generalizations of the Jaccard index. Preprint, arXiv:2110.09619.Google Scholar
Eelbode, T. et al. (2020). Optimization for medical image segmentation: Theory and practice when evaluating with dice score or Jaccard index. IEEE Trans. Med. Imag. 39, 36793690.CrossRefGoogle ScholarPubMed
Fan, X. et al. (2019). Similarity and heterogeneity of price dynamics across China’s regional carbon markets: A visibility graph network approach. Appl. Energy 235, 739746.CrossRefGoogle Scholar
Feng, Q., Hu, Z. and Su, C. (2013). The Zagreb indices of random graphs. Prob. Eng. Inf. Sci. 27, 247260.CrossRefGoogle Scholar
Gilbert, G. (1972). Distance between sets. Nature 239, 174.CrossRefGoogle Scholar
Hennig, C. (2007). Cluster-wise assessment of cluster stability. Comput. Statist. Data Anal. 52, 258271.CrossRefGoogle Scholar
Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 241272.Google Scholar
Janson, S., Luczak, T. and Rucinski, A. (2000). Random Graphs. John Wiley, New York.CrossRefGoogle Scholar
Koeneman, S. H. and Cavanaugh, J. E. (2022). An improved asymptotic test for the Jaccard similarity index for binary data. Statist. Prob. Lett. 184, 109375.CrossRefGoogle Scholar
Kogge, P. M. (2016). Jaccard coefficients as a potential graph benchmark. In Proc. 2016 IEEE Int. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 921–928. IEEE, Piscataway, NJ.CrossRefGoogle Scholar
Kosub, S. (2019). A note on the triangle inequality for the Jaccard distance. Pattern Recognition Lett. 120, 3638.CrossRefGoogle Scholar
Lu, H. and Uddin, S. (2023). Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf. Sci. Syst. 11, 2.CrossRefGoogle ScholarPubMed
Mammone, N. et al. (2018). Permutation Jaccard distance-based hierarchical clustering to estimate EEG network density modifications in MCI subjects. IEEE Trans. Neural Netw. Learn. Syst. 29, 51225135.CrossRefGoogle Scholar
Miasnikof, P., Shestopaloff, A. Y., Pitsoulis, L. and Ponomarenko, A. (2022). An empirical comparison of connectivity-based distances on a graph and their computational scalability. J. Complex Netw. 10, cnac003.CrossRefGoogle Scholar
Sathre, P., Gondhalekar, A. and Feng, W. C. (2022). Edge-connected Jaccard similarity for graph link prediction on FPGA. In Proc. 2022 IEEE High Performance Extreme Computing Conf. (HPEC), pp. 1–10. IEEE, Piscataway, NY.CrossRefGoogle Scholar
Shestopaloff, P. M., Alexander, Y., Bravo, C. and Lawryshyn, Y. (2023). Statistical network isomorphism. In Complex Networks and Their Applications XI, eds H. Cherifi, R. N. Mantegna, L. M. Rocha, C. Cherifi, and S. Micciche. Springer, New York, pp. 325–336.Google Scholar
Shi, X., Wu, Y. and Liu, Y. (2010). A note on asymptotic approximations of inverse moments of nonnegative random variables. Statist. Prob. Lett. 80, 12601264.CrossRefGoogle Scholar
Singh, M. D., Krishna, P. R. and Saxena, A. (2009). A privacy preserving Jaccard similarity function for mining encrypted data. In Proc. TENCON 2009–2009 IEEE Region 10 Conf., pp. 14.CrossRefGoogle Scholar
van der Hofstad, R. (2016). Random Graphs and Complex Networks. Cambridge University Press.CrossRefGoogle Scholar
Verzelen, N. and Arias-Castro, E. (2015). Community detection in sparse random networks. Ann. Appl. Prob. 25, 34653510.CrossRefGoogle Scholar
Wu, C. and Wang, B. (2017). Extracting topics based on word2vec and improved Jaccard similarity coefficient. In Proc. 2017 IEEE 2nd Int. Conf. Data Science in Cyberspace (DSC), pp. 389397.CrossRefGoogle Scholar
Wuyungaowa, and Wang, T., (2008). Asymptotic expansions for inverse moments of binomial and negative binomial. Statist. Prob. Lett. 78, 30183022.CrossRefGoogle Scholar
Yin, Y. and Yasuda, K. (2006). Similarity coefficient methods applied to the cell formation problem: A taxonomy and review. Int. J. Prod. Econ. 101, 329352.CrossRefGoogle Scholar
Zhang, P. et al. (2016). Measuring the robustness of link prediction algorithms under noisy environment. Sci. Rep. 6, 18881.CrossRefGoogle ScholarPubMed