Classifying objects or, more generally, recognising patterns is not a simple task for automated procedures, particularly when the objects are of biological interest. For example, identifying species, predicting species distributions or finding gene expression patterns that predict the risk of developing a particular type of tumour are generally difficult tasks. In this book the tool responsible for the classification is called a classifier and the term encompasses a wide range of designs, some of which are described in the next chapters.
The first attempts at biological pattern recognition tended to use statistical methods, for example discriminant analysis and logistic regression, but more recently a wider range of techniques has been used, including some that have ‘borrowed’ ideas from biology. Although a wide range of algorithms is covered in these pages, it is impossible to be comprehensive. However, the techniques that have been included should help readers to understand the logic of other techniques.
Jain et al. (2000) recognised four distinct approaches to pattern recognition. The first approach, and the main theme of their paper, was statistical. These methods find decision boundaries by making use of information about the class probability distributions. The second approach is one of the simplest and relies on matching cases to class templates or exemplars. The third is little used in biology and relies on decomposing patterns into sub-patterns, which may also be broken down further. This hierarchical approach is called syntactic because of its similarity to the structure of language.