Skip to main content
×
Home

Uncovering the structure and temporal dynamics of information propagation

  • MANUEL GOMEZ RODRIGUEZ (a1), JURE LESKOVEC (a2), DAVID BALDUZZI (a3) and BERNHARD SCHÖLKOPF (a4)
Abstract
Abstract

Time plays an essential role in the diffusion of information, influence, and disease over networks. In many cases we can only observe when a node is activated by a contagion—when a node learns about a piece of information, makes a decision, adopts a new behavior, or becomes infected with a disease. However, the underlying network connectivity and transmission rates between nodes are unknown. Inferring the underlying diffusion dynamics is important because it leads to new insights and enables forecasting, as well as influencing or containing information propagation. In this paper we model diffusion as a continuous temporal process occurring at different rates over a latent, unobserved network that may change over time. Given information diffusion data, we infer the edges and dynamics of the underlying network. Our model naturally imposes sparse solutions and requires no parameter tuning. We develop an efficient inference algorithm that uses stochastic convex optimization to compute online estimates of the edges and transmission rates. We evaluate our method by tracking information diffusion among 3.3 million mainstream media sites and blogs, and experiment with more than 179 million different instances of information spreading over the network in a one-year period. We apply our network inference algorithm to the top 5,000 media sites and blogs and report several interesting observations. First, information pathways for general recurrent topics are more stable across time than for on-going news events. Second, clusters of news media sites and blogs often emerge and vanish in a matter of days for on-going news events. Finally, major events, for example, large scale civil unrest as in the Libyan civil war or Syrian uprising, increase the number of information pathways among blogs, and also increase the network centrality of blogs and social media sites.

Copyright
References
Hide All
Aalen O. O., Borgan Ø., & Gjessing H. K. (2008). Survival and event history analysis: A process point of view. New York, NY: Springer-Verlag.
Adar E., & Adamic L. A. (2005). Tracking information epidemics in blogspace. In Proceedings of The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (pp. 207214). Washington, DC: IEEE.
Agarwal A., & Duchi J. C. (2011). Distributed delayed stochastic optimization. In Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., & Weinberger K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 451459). NIPS Foundation.
Aral S., & Walker D. (2012). Identifying influential and susceptible members of social networks. Science, 337 (6092), 337341.
Bach F., & Moulines E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., & Weinberger K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 451459). NIPS Foundation.
Barabási A.-L., & Albert R. (1999). Emergence of scaling in random networks. Science, 286, 509512.
Blatt D., Hero A. O., & Gauchman H. (2008). A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18 (1), 2951.
Boyd S. P., & Vandenberghe L. (2004). Convex optimization. Cambridge, UK: Cambridge University Press.
Brockmann D., Hufnagel L., & Geisel T. (2006). The scaling laws of human travel. Nature, 439 (7075), 462465.
Chierichetti F., Kleinberg J., & Liben-Nowell D. (2011). Reconstructing patterns of information diffusion from incomplete observations. In Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., & Weinberger K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 792800). NIPS Foundation.
Clauset A., Moore C., & Newman M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453 (7191), 98101.
Du N., Song L., Gomez-Rodriguez M., & Zha H. (2013). Scalable influence estimation in continuous-time diffusion networks. In Burges C. J. C., Bottou L., Welling M., Ghahramani Z., & Weinberger K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 27892797). NIPS Foundation.
Du N., Song L., Smola A., & Yuan M. (2012). Learning networks of heterogeneous influence. In Barlett P., Pereira F. C. N., Burges C. J. C., Bottou L., & Weinberger K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 27892797). NIPS Foundation.
Duchi J. C., Agarwal A., Johansson M., & Jordan M. I. (2011). Ergodic subgradient descent. Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing (pp. 701706). IEEE.
Erdős P., & Rényi A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Science Series B, 5, 1767.
Gomez-Rodriguez M., Leskovec J., & Krause A. (2010). Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 10191028). ACM.
Gomez-Rodriguez M., Leskovec J., & Krause A. (2012). Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data, 5 (4), 21:137, ACM.
Gomez-Rodriguez M., & Schölkopf B. (2012a). Influence maximization in continuous time diffusion networks. In Langford John & Pineau Joelle (Eds.), Proceedings of the 29th International Conference on Machine Learning (pp. 313320). Omnipress.
Gomez-Rodriguez M., & Schölkopf B. (2012b). Modeling information propagation with survival theory. In Bartlett P., Pereira F. C. N., Burges C. J. C., Bottou L., & Weinberger K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25): Workshop in Algorithmic and Statistical Approaches for Large Social Networks. NIPS Foundation.
Gomez-Rodriguez M., & Schölkopf B. (2012c). Submodular inference of diffusion networks from multiple trees. In Langford John & Pineau Joelle (Eds.), Proceedings of the 29th International Conference on Machine Learning (pp. 489496). Omnipress.
Grant M., & Boyd S. (2010). CVX: Matlab software for disciplined convex programming, version 1.21. Retrieved from http://cvxr.com/cvx (2013).
Hufnagel L., Brockmann D., & Geisel T. (2004). Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences of the United States of America, 101 (42), 15124.
InfoPath. (2013). InfoPath. Retrieved from http://snap.stanford.edu/infopath/ (2013).
Kaplan E. H. (1989). What are the risks of risky sex? Modeling the AIDS epidemic. Operations Research, 37 (2), 198209.
Kempe D., Kleinberg J. M. & Tardos É. (2003). Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 137146). ACM.
Lappas T., Terzi E., Gunopulos D., & Mannila H. (2010). Finding effectors in social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 10591068). ACM.
Lawless J. F. (1982). Statistical models and methods for lifetime data. New York, NY: Wiley.
Leskovec J., Adamic L. A., & Huberman B. A. (2006). The dynamics of viral marketing. In Proceedings of the 7th ACM Conference on Electronic Commerce (pp. 228237). ACM.
Leskovec J., Backstrom L., & Kleinberg J. (2009). Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 497506). New York, NY: ACM.
Leskovec J., Chakrabarti D., Kleinberg J., Faloutsos C., & Ghahramani Z. (2010). Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11, 9851042.
Leskovec J., Krause A., Guestrin C., Faloutsos C., VanBriesen J., & Glance N. (2007b). Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 420–429).
Leskovec J., Lang K. J., Dasgupta A., & Mahoney M. W. (2008). Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (pp. 695704). ACM.
Leskovec J., McGlohon, M., Faloutsos, C., Glance, N., & Hurst M. (2007a). Cascading behavior in large blog graphs. In Proceedings of the SIAM Conference on Data Mining (pp. 551556). SIAM.
Liben-Nowell D., & Kleinberg J. (2008). Tracing the flow of information on a global scale using Internet chain-letter data. Proceedings of the National Academy of Sciences, 105 (12), 46334638.
Lipsitch M., Cohen T., Cooper B., Robins J. M., Ma S., James L., . . . Murray M. (2003). Transmission dynamics and control of severe acute respiratory syndrome. Science, 300 (5627), 1966.
Merriam-Webster's collegiate dictionary. 2004. Springfield, MA: Merriam-Webster.
Miller M., Sathi C., Wiesenthal D., Leskovec J., & Potts C. (2011). Sentiment flow through hyperlink networks. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. AAAI.
Myers S., & Leskovec J. (2010). On the convexity of latent social network inference. In Lafferty J., Williams C. K. I., Shawe-Taylor J., Zemel R. S., & Culotta A. (Eds.), Advances in Neural Information Processing Systems 23 (NIPS-23) (pp. 17411749). NIPS Foundation.
Myers S., & Leskovec J. (2012). Clash of the contagions: Cooperation and competition in information diffusion. Proceedings of the IEEE International Conference on Data Mining (pp. 539548). IEEE.
Myers S., Leskovec J., & Zhu C. (2012). Information diffusion and external influence in networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3341). ACM.
Nemirovski A., Juditsky A., Lan G., & Shapiro A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (4), 1574.
Netrapalli P., & Sanghavi S. (2012). Finding the graph of epidemic cascades. In Proceedings of the 12th ACM SIGMETRICS/Performance. PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (pp. 211222). ACM.
NetRate. (2011). NetRate. Retrieved from http://people.tue.mpg.de/manuelgr/netrate/ (2013).
Newey W. K., & McFadden D. L. (1994). Large sample estimation and hypothesis testing. In Engle R. F. & McFadden D. L. (Eds.), Handbook of econometrics, Vol. IV (pp. 21112245). Amsterdam, Netherlands: Elsevier Science B.V.
Prakash B. A., Beutel A., Rosenfeld R., & Faloutsos C. (2012). Winner takes all: Competing viruses or ideas on fair-play networks. In Proceedings of the 21st International Conference on World Wide Web (pp. 10371046). ACM.
Robbins H., & Monro S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22 (3), 400407.
Rogers E. M. (1995). Diffusion of innovations (4th ed.). New York, NY: Free Press.
Romero D. M., Meeder B., & Kleinberg J. (2011). Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th International Conference on World Wide Web (pp. 695704). ACM.
Roux N. L., Schmidt M., & Bach F. (2012). A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. In Bartlett P., Pereira F. C. N., Burges C. J. C., Bottou L., & Weinberger K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 26722680). NIPS Foundation.
Sadikov S., Medina M., Leskovec J., & Garcia-Molina H. (2011). Correcting for missing data in information cascades. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (pp. 5564). ACM.
SNAP. (2012). SNAP: Stanford network analysis platform. Retrieved from http://snap.stanford.edu (2013).
Snowsill T. M., Fyson N., De Bie T., & Cristianini N. (2011). Refining causality: Who copied from whom? In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 466474). ACM.
Vu D. Q., Asuncion A. U., Hunter D. R., & Smyth P. (2011). Continuous-time regression models for longitudinal networks. In Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., & Weinberger K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 24922500). NIPS Foundation.
Wallinga J., & Teunis P. (2004). Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160 (6), 509516.
Wang C., Knight J. C., & Elder M. C. (2000). On computer viral infection and the effect of immunization. In Proceedings of the 16th Annual Conference on Computer Security Applications (pp. 246256). IEEE Computer Society.
Watts D. J., & Dodds P. S. (2007). Influentials, networks, and public opinion formation. Journal of Consumer Research, 34 (4), 441458.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Network Science
  • ISSN: 2050-1242
  • EISSN: 2050-1250
  • URL: /core/journals/network-science
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords:

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 115 *
Loading metrics...

Abstract views

Total abstract views: 532 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd November 2017. This data will be updated every 24 hours.