Statistical evaluation of spectral methods for anomaly detection in static networks

Tomilayo Komolafe; A. Valeria Quevedo; Srijan Sengupta; William H. Woodall

doi:10.1017/nws.2019.14

Statistical evaluation of spectral methods for anomaly detection in static networks

Published online by Cambridge University Press: 23 September 2019

and

Tomilayo Komolafe*: Affiliation:
Statistics Department, Virginia Polytechnic Institute and State University, Blacksburg VA, USA (e-mails: sengupta@vt.edu; bwoodall@vt.edu)
A. Valeria Quevedo: Affiliation:
Statistics Department, Virginia Polytechnic Institute and State University, Blacksburg VA, USA (e-mails: sengupta@vt.edu; bwoodall@vt.edu) Faculty of Engineering, Universidad de Piura, Peru (e-mail: anavq@vt.edu)
Srijan Sengupta: Affiliation:
Statistics Department, Virginia Polytechnic Institute and State University, Blacksburg VA, USA (e-mails: sengupta@vt.edu; bwoodall@vt.edu)
William H. Woodall: Affiliation:
Statistics Department, Virginia Polytechnic Institute and State University, Blacksburg VA, USA (e-mails: sengupta@vt.edu; bwoodall@vt.edu)
*: *Corresponding author. Email: tomilayo@vt.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The topic of anomaly detection in networks has attracted a lot of attention in recent years, especially with the rise of connected devices and social networks. Anomaly detection spans a wide range of applications, from detecting terrorist cells in counter-terrorism efforts to identifying unexpected mutations during ribonucleic acid transcription. Fittingly, numerous algorithmic techniques for anomaly detection have been introduced. However, to date, little work has been done to evaluate these algorithms from a statistical perspective. This work is aimed at addressing this gap in the literature by carrying out statistical evaluation of a suite of popular spectral methods for anomaly detection in networks. Our investigation on the statistical properties of these algorithms reveals several important and critical shortcomings that we make methodological improvements to address. Further, we carry out a performance evaluation of these algorithms using simulated networks and extend the methods from binary to count networks.

Keywords

residual matrix spectral methods R-MAT model principal components

Type: Original Article
Information: Network Science , Volume 7 , Issue 3 , September 2019 , pp. 319 - 352

DOI: https://doi.org/10.1017/nws.2019.14 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aiello, W., Chung, F., & Lu, L. (2001). A random graph model for power law graphs. Experimental Mathematics, 10(1), 53–66.CrossRef Google Scholar

Akoglu, L., McGlohon, M., & Faloutsos, C. (2010). Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 410–421). Springer, Berlin, Heidelberg.CrossRef Google Scholar

Akoglu, L., Tong, H., & Koutra, D. (2015). Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3), 626–688.CrossRef Google Scholar

Albert, R., Albert, I., & Nakarado, G. L. (2004). Structural vulnerability of the North American power grid. Physical Review E, 69(2), 025103.CrossRef Google Scholar PubMed

Azarnoush, B., Paynabar, K., Bekki, J., & Runger, G. (2016). Monitoring temporal homogeneity in attributed network streams. Journal of Quality Technology, 48(1), 28–43.CrossRef Google Scholar

Bader, D. A., & Madduri, K. (2008). Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In 2008 IEEE international symposium on parallel and distributed processing, 2008, Miami, FL (pp. 1–12). IEEE.Google Scholar

Cer, R., Bruce, K., Donohue, D., Temiz, N., Mudunuri, U., Yi, M., … Stephens, R. (2012). Searching for non-B DNA-forming motifs using nBMST (non-B DNA motif search tool). Current Protocols in Human Genetics (pp. 18.7.1–18.7.22).CrossRef Google Scholar

Cer, R. Z., Bruce, K. H., Donohue, D. E., Temiz, A. N., Bacolla, A., Mudunuri, U. S., … Collins, J. R. (2011). Introducing the non-B DNA Motif Search Tool (nBMST). Genome Biology, 12(1), P34.CrossRef Google Scholar

Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining, Lake Buena Vista, FL (pp. 442–446). SIAM.CrossRef Google Scholar

Chawla, S., & Sun, P. (2006). SLOM: A new measure for local spatial outliers. Knowledge and Information Systems, 9(4), 412–429.CrossRef Google Scholar

Chung, F., Lu, L., & Vu, V. (2004). Spectra of random graphs with given expected degrees. Internet Mathematics, 1(3), 257–275.CrossRef Google Scholar

Dahan, M., Sela, L., & Amin, S. (2017). Network monitoring under strategic disruptions. arXiv preprint arXiv:1705.00349.Google Scholar

Erdos, P., & Rényi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 5(1), 17–60.Google Scholar

Farahani, E. M., Kazemzadeh, R. B., Noorossana, R., & Rahimian, G. (2017). A statistical approach to social network monitoring. Communications in Statistics-Theory and Methods, 46(22), 11272–11288.CrossRef Google Scholar

Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRef Google Scholar

Lei, J., & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1), 215–237.CrossRef Google Scholar

Mall, R., Langone, R., & Suykens, J. A. (2013). Kernel spectral clustering for big data networks. Entropy, 15(5), 1567–1586.CrossRef Google Scholar

Miller, B. A., Beard, M. S., Wolfe, P. J., & Bliss, N. T. (2015). A spectral framework for anomalous subgraph detection. IEEE Transactions on Signal Processing, 63(16), 4191–4206.CrossRef Google Scholar

Miller, B. A., Bliss, N., & Wolfe, P. J. (2010a). Subgraph detection using eigenvector L1 norms. Advances in Neural Information Processing Systems, 23, 1633–1641.Google Scholar

Miller, B. A., Bliss, N. T., & Wolfe, P. J. (2010b). Toward signal processing theory for graphs and non-Euclidean data. In 2010 IEEE Proceedings International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas (pp. 5414–5417). ICASSP.CrossRef Google Scholar

Nadarajah, S., & Kotz, S. (2004). The beta Gumbel distribution. Mathematical Problems in Engineering, 4, 323–332.CrossRef Google Scholar

Newman, M. (2016). Community detection in networks: Modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319.Google Scholar

Papadimitriou, S., Kitagawa, H., Gibbons, P. B., & Faloutsos, C. (2003). Loci: Fast outlier detection using the local correlation integral. In Proceedings 19th International Conference on Data Engineering, Bangalore, India pp. 315–326. IEEE.Google Scholar

Priebe, C. E., Conroy, J. M., Marchette, D. J., & Park, Y. (2005). Scan statistics on Enron graphs. Computational & Mathematical Organization Theory, 11(3), 229–247.CrossRef Google Scholar

Procter, J. B., Thompson, J., Letunic, I., Creevey, C., Jossinet, F., & Barton, G. J. (2010). Visualization of multiple alignments, phylogenies and gene family evolution. Nature Methods, 7, S16–S25.CrossRef Google Scholar PubMed

Qin, T., & Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In Advances in Neural Information Processing Systems, Lake Tahoe, NV (pp. 3120–3128).Google Scholar

Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C., & Samatova, N. F. (2015). Anomaly detection in dynamic networks: A survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), 223–247.CrossRef Google Scholar

Raulf-Heimsoth, M., Chen, Z., Rihs, H., Kalbacher, H., Liebers, V., & Baur, X. (1998). Analysis of t-cell reactive regions and HLA-DR4 binding motifs on the latex allergen Hev b 1 (rubber elongation factor). Clinical and Experimental Allergy, 28(3), 339–348.CrossRef Google Scholar

Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.CrossRef Google Scholar

Šaltenis, V. (2004). Outlier detection based on the distribution of distances between data points. Informatica, 15(3), 399–410.Google Scholar

Savage, D., Zhang, X., Yu, X., Chou, P., & Wang, Q. (2014). Anomaly detection in online social networks. Social Networks, 39, 62–70.CrossRef Google Scholar

Sengupta, S. (2018). Anomaly detection in static networks using egonets. arXiv preprint arXiv:1807.08925.Google Scholar

Sengupta, S., & Chen, Y. (2015). Spectral clustering in heterogeneous networks. Statistica Sinica, 25, 1081–1106.Google Scholar

Singh, N., Miller, B. A., Bliss, N. T., & Wolfe, P. J. (2011). Anomalous subgraph detection via sparse principal component analysis. In 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France (pp. 485–488). IEEE.CrossRef Google Scholar

Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX (pp. 1–8). IEEE.Google Scholar

Wang, G., Xie, S., Liu, B., & Yu, P. S. (2012). Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology (TIST), 3(4), 61.Google Scholar

Woodall, W. H., Zhao, M. J., Paynabar, K., Sparks, R., & Wilson, J. D. (2017). An overview and perspective on social network monitoring. IISE Transactions, 49(3), 354–365.CrossRef Google Scholar

Komolafe et al. supplementary material

PDF 4 MB

Article contents

Statistical evaluation of spectral methods for anomaly detection in static networks

Abstract

Keywords

Access options

References

Komolafe et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests