Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-09T04:55:38.268Z Has data issue: false hasContentIssue false

(Un/Semi-)supervised SMS text message SPAM detection

Published online by Cambridge University Press:  15 October 2014

CHRIS R. GIANNELLA
Affiliation:
The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA email: cgiannella@mitre.org, rwinder@mitre.org
RANSOM WINDER
Affiliation:
The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA email: cgiannella@mitre.org, rwinder@mitre.org
BRANDON WILSON
Affiliation:
Department of Computer Science, University of Maryland, College Park, MD 20742, USA email: bswilson@cs.umd.edu

Abstract

We address the problem of unsupervised and semi-supervised SMS (Short Message Service) text message SPAM detection. We develop a content-based Bayesian classification approach which is a modest extension of the technique discussed by Resnik and Hardisty in 2010. The approach assumes that the bodies of the SMS messages arise from a probabilistic generative model and estimates the model parameters by Gibbs sampling using an unlabeled, or partially labeled, SMS training message corpus. The approach classifies new SMS messages as SPAM or HAM (non-SPAM) by zero-thresholding their logit estimates. We tested the approach on a publicly available SMS corpora collected from the UK. Used in semi-supervised fashion, the approach clearly outperformed a competing algorithm, Semi-Boost. Used in unsupervised fashion, the approach outperformed a fully supervised classifier, an SVM (Support Vector Machine), when the number of training messages used by the SVM was small and performed comparably otherwise. We believe the approach works well and is a useful tool for SMS SPAM detection.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable