We have seen different classifiers in the preceding chapters, such as decision trees, full and naive Bayes classifiers, nearest neighbors classifier, support vector machines, and so on. In general, we may think of the classifier as a model or function M that predicts the class label ŷ for a given input example x:
ŷ = M(x)
where x = (x1, x2, …, xd)T ∈ Rd is a point in d-dimensional space and ŷ ∈ {c1, c2, …, ck} is its predicted class.
To build the classification model M we need a training set of points along with their known classes. Different classifiers are obtained depending on the assumptions used to build the model M. For instance, support vector machines use the maximum margin hyperplane to construct M. On the other hand, the Bayes classifier directly computes the posterior probability P(cj|x) for each class cj, and predicts the class of x as the one with the maximum posterior probability, ŷ = argmaxcj {P(cj|x)}. Once the model M has been trained, we assess its performance over a separate testing set of points for which we know the true classes. Finally, the model can be deployed to predict the class for future points whose class we typically do not know.