Information theory is developed in the communications community, but it turns out to be very useful for pattern recognition. In this chapter, we start with an example to develop the ideas of uncertainty and its measurement, i.e., entropy. A few core results in information theory are introduced: entropy, joint and conditional entropy, mutual information, and their relationships. We then move to differential entropy for continuous random variables and find distributions with maximum entropy under certain constraints, which are useful for pattern recognition. Finally, we introduce the applications of information theory in our context: maximum entropy learning, minimum cross entropy, feature selection, and decision trees (a widely used family of models for pattern recognition and machine learning).
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.