Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Chapter 3: Basics of Detection and Mixture Models

Chapter 3: Basics of Detection and Mixture Models

pp. 56-75

Authors

, Pennsylvania State University, , University of Illinois, Urbana-Champaign, , Pennsylvania State University
Resources available Unlock the full potential of this textbook with additional resources. There are free resources and Instructor restricted resources available for this textbook. Explore resources
  • Add bookmark
  • Cite
  • Share

Extract

In this chapter, we introduce the design of statistical anomaly detectors. We discuss types of data – continuous, discrete categorical, and discrete ordinal features – encountered in practice. We then discuss how to model such data, in particular to form a null model for statistical anomaly detection, with emphasis on mixture densities. The EM algorithm is developed for estimating the parameters of a mixture density. K-means is a specialization of EM for Gaussian mixtures. The Bayesian information criterion (BIC) is discussed and developed – widely used for estimating the number of components in a mixture density. We also discuss parsimonious mixtures, which economize on the number of model parameters in a mixture density (by sharing parameters across components). These models allow BIC to obtain accurate model order estimates even when the feature dimensionality is huge and the number of data samples is small (a case where BIC applied to traditional mixtures grossly underestimates the model order). Key performance measures are discussed, including true positive rate, false positive rate, and receiver operating characteristic (ROC) and associated area-under-the-curve (ROC AUC). The density models are used in attack detection defenses in Chapters 4 and 13. The detection performance measures are used throughout the book.

Keywords

  • mixture model
  • EM algorithm
  • K-means
  • parsimonious mixtures
  • maximum likelihood
  • Bayesian information criterion
  • anomaly detection
  • receiver operating characteristic
  • p-value
  • principal component analysis

About the book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook
US$69.99
Hardback
US$69.99

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers