Skip to main content
×
×
Home

Seeded PageRank solution paths

  • D. F. GLEICH (a1) and K. KLOSTER (a2)
Abstract

We study the behaviour of network diffusions based on the PageRank random walk from a set of seed nodes. These diffusions are known to reveal small, localized clusters (or communities), and also large macro-scale clusters by varying a parameter that has a dual-interpretation as an accuracy bound and as a regularization level. We propose a new method that quickly approximates the result of the diffusion for all values of this parameter. Our method efficiently generates an approximate solution path or regularization path associated with a PageRank diffusion, and it reveals cluster structures at multiple size-scales between small and large. We formally prove a runtime bound on this method that is independent of the size of the network, and we investigate multiple optimizations to our method that can be more practical in some settings. We demonstrate that these methods identify refined clustering structure on a number of real-world networks with up to 2 billion edges.

Copyright
References
Hide All
[1] Andersen R. & Chung F. (2007) Detecting sharp drops in pagerank and a simplified local partitioning algorithm. In Cai Jin-Yi, Cooper S. Barry, and Zhu Hong (Eds). Theory and Applications of Models of Computation, Springer-verlag, Berlin Heidelberg, pp. 112.
[2] Andersen R., Chung F. & Lang K. (2006) Local graph partitioning using PageRank vectors. In: FOCS.
[3] Boldi P., Bonchi F., Castillo C., Donato D., Gionis A. & Vigna S. (2008) The query-flow graph: Model and applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, New York, NY, USA, ACM, pp. 609618.
[4] Boldi P., Rosa M., Santini M. & Vigna S. (March 2011) Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th WWW2011, pp. 587–596.
[5] Boldi P., Santini M. & Vigna S. (2009) PageRank: Functional dependencies. ACM Trans. Inf. Syst. 27 (4), 123.
[6] Brezinski C., Redivo-Zaglia M. & Serra-Capizzano S. (March 2005) Extrapolation methods for pagerank computations. Comptes Rendus Mathematique 340 (5), 393397.
[7] Chierichetti F., Kumar R., Lattanzi S., Mitzenmacher M., Panconesi A. & Raghavan P. (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, New York, NY, USA, ACM, pp. 219228.
[8] Delvenne J.-C., Yaliraki S. N. & Barahona M. (June 2010) Stability of graph communities across time scales. Proc. Natl. Acad. Sci. 107 (29), 1275512760.
[9] Efron B., Hastie T., Johnstone I. & Tibshirani R. (2004) Least angle regression. Ann. Statist. 32 (2), 407499.
[10] Ghosh R., Teng S.-H., Lerman K. & Yan X. (2014) The interplay between dynamics and networks: Centrality, communities, and cheeger inequality. In: KDD, pp. 1406–1415.
[11] Gleich D. F. (August 2015) PageRank beyond the web. SIAM Rev. 57 (3), 321363.
[12] Gleich D. F. & Mahoney M. M. (2014) Algorithmic anti-differentiation: A case study with min-cuts, spectral, and flow. In: ICML, pp. 1018–1025.
[13] Gleich D. F. & Mahoney M. W. (2015) Using local spectral methods to robustify graph-based learning algorithms. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, New York, NY, USA, ACM, pp. 359368.
[14] Gleich D. F. & Seshadhri C. (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: KDD, pp. 597–605.
[15] Gutierrez-Bunster T., Stege U., Thomo A. & Taylor J. (2014) How do biological networks differ from social networks? (an experimental study). In: ASONAM, pp. 744–751.
[16] Hastie T., Tibshirani R. & Friedman J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York, Springer.
[17] Hocking T., Vert J.-P., Joulin A. & Bach F. R. (2011) Clusterpath: An algorithm for clustering using convex fusion penalties. In: ICML, pp. 745–752.
[18] Jeub L. G. S., Balachandran P., Porter M. A., Mucha P. J. & Mahoney M. W. (January 2015) Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Phys. Rev. E 91, 012821.
[19] Kloster K. & Gleich D. F. (2014) Heat kernel based community detection. In: KDD, pp. 1386–1395.
[20] Kwak H., Lee C., Park H. & Moon S. (2010) What is Twitter, a social network or a news media? In: WWW '10: Proceedings of the 19th International Conference on World Wide Web, New York, NY, USA, ACM, pp. 591600.
[21] Langville A. N. & Meyer C. D. (2006) Google's PageRank and Beyond: The Science of Search Engine Rankings, Princeton, NJ, Princeton University Press.
[22] Leskovec J., Lang K. J., Dasgupta A. & Mahoney M. W. (September 2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6 (1), 29123.
[23] Lindsten F., Ohlsson H. & Ljung L. (2011) Just Relax and Come Clustering! A Convexification of k-Means Clustering, Technical Report, Linköpings Universitet.
[24] Mahoney M. W., Orecchia L. & Vishnoi N. K. (August 2012) A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. J. Mach. Learn. Res 13, 23392365.
[25] Mislove A., Marcon M., Gummadi K. P., Druschel P. & Bhattacharjee B. (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, New York, NY, USA, ACM, pp. 2942.
[26] Newman M. E. J. (September 2006) Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74 (3), 036104.
[27] Poole K. T. (2011) Vote View. URL: http://voteview.com/. [Accessed 11/10/2015].
[28] Schaeffer S. E. (2007) Graph clustering. Comput. Sci. Rev. 1 (1), 2764.
[29] C. (The Cooperative Association for Internet Data Analyais). (2005) Network datasets. Accessed 2005. URL: http://www.caida.org/tools/measurement/skitter/router_topology/
[30] Whang J. J., Gleich D. F. & Dhillon I. S. (2013) Overlapping community detection using seed set expansion. In: CIKM, pp. 2099–2108.
[31] Wilson C., Boe B., Sala A., Puttaswamy K. P. & Zhao B. Y. (2009) User interactions in social networks and their implications. In: EuroSys, pp. 205–218.
[32] Xie J., Kelley S. & Szymanski B. K. (2013) Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. 45 (4), 43:1–43:35.
[33] Yang J. & Leskovec J. (December 2012) Defining and evaluating network communities based on ground-truth. In: IEEE 12th International Conference on Data Mining (ICDM), pp. 745–754.
[34] Zhou D., Bousquet O., Lal T. N., Weston J. & Schölkopf B. (2003) Learning with local and global consistency. In: Advances in Neural Information Processing Systems (NIPS), Vol. 16, pp. 321–328.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

European Journal of Applied Mathematics
  • ISSN: 0956-7925
  • EISSN: 1469-4425
  • URL: /core/journals/european-journal-of-applied-mathematics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords:

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 61 *
Loading metrics...

Abstract views

Total abstract views: 206 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st February 2018. This data will be updated every 24 hours.