Search results for Pattern Recognition and Machine Learning

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Nello Cristianini, John Shawe-Taylor
Published online:

05 March 2013

Print publication:

23 March 2000
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This is the first comprehensive introduction to Support Vector Machines (SVMs), a generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.

Machine Learning

The Art and Science of Algorithms that Make Sense of Data
Peter Flach
Published online:

05 November 2012

Print publication:

20 September 2012
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
As one of the most comprehensive machine learning texts around, this book does justice to the field's incredible richness, but without losing sight of the unifying principles. Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

Index
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 383-396
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Epilogue: Where to go from here
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 360-362
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

AND SO WE HAVE come to the end of our journey through the ‘making sense of data’ landscape. We have seen how machine learning can build models from features for solving tasks involving data. We have seen how models can be predictive or descriptive; learning can be supervised or unsupervised; and models can be logical, geometric, probabilistic or ensembles of such models. Now that I have equipped you with the basic concepts to understand the literature, there is a whole world out there for you to explore. So it is only natural for me to leave you with a few pointers to areas you may want to learn about next.
One thing that we have often assumed in the book is that the data comes in a form suitable for the task at hand. For example, if the task is to label e-mails we conveniently learn a classifier from data in the form of labelled e-mails. For tasks such as class probability estimation I introduced the output space (for the model) as separate from the label space (for the data) because the model outputs (class probability estimates) are not directly observable in the data and have to be reconstructed. An area where the distinction between data and model output is much more pronounced is reinforcement learning. Imagine you want to learn how to be a good chess player. This could be viewed as a classification task, but then you require a teacher to score every move.

11 - Model ensembles
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 330-342
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

TWO HEADS ARE BETTER THAN ONE – a well-known proverb suggesting that two minds working together can often achieve better results. If we read ‘features’ for ‘heads’ then this is certainly true in machine learning, as we have seen in the preceding chapters. But we can often further improve things by combining not just features but whole models, as will be demonstrated in this chapter. Combinations of models are generally known as model ensembles. They are among the most powerful techniques in machine learning, often outperforming other methods. This comes at the cost of increased algorithmic and model complexity.
The topic of model combination has a rich and diverse history, to which we can only partly do justice in this short chapter. The main motivations came from computational learning theory on the one hand, and statistics on the other. It is a well-known statistical intuition that averaging measurements can lead to a more stable and reliable estimate because we reduce the influence of random fluctuations in single measurements. So if we were to build an ensemble of slightly different models from the same training data, we might be able to similarly reduce the influence of random fluctuations in single models. The key question here is how to achieve diversity between these different models. As we shall see, this can often be achieved by training models on random subsets of the data, and even by constructing them from random subsets of the available features.

7 - Linear models
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 194-230
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Features
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 298-329
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Frontmatter
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp i-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Tree models
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 129-156
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

TREE MODELS ARE among the most popular models in machine learning. For example, the pose recognition algorithm in the Kinect motion sensing device for the Xbox game console has decision tree classifiers at its heart (in fact, an ensemble of decision trees called a random forest about which you will learn more in Chapter 11). Trees are expressive and easy to understand, and of particular appeal to computer scientists due to their recursive ‘divide-and-conquer’ nature.
In fact, the paths through the logical hypothesis space discussed in the previous chapter already constitute a very simple kind of tree. For instance, the feature tree in Figure 5.1 (left) is equivalent to the path in Figure 4.6 (left) on p.117. This equivalence is best seen by tracing the path and the tree from the bottom upward.
The left-most leaf of the feature tree represents the concept at the bottom of the path, covering a single positive example.
The next concept up in the path generalises the literal Length = 3 into Length = [3,5] by means of internal disjunction; the added coverage (one positive example) is represented by the second leaf from the left in the feature tree.
By dropping the condition Teeth = few we add another two covered positives.
Dropping the ‘Length’ condition altogether (or extending the internal disjunction with the one remaining value ‘4’) adds the last positive, and also a negative.
[…]

3 - Beyond binary classification
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 81-103
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

THE PREVIOUS CHAPTER introduced binary classification and associated tasks such as ranking and class probability estimation. In this chapter we will go beyond these basic tasks in a number of ways. Section 3.1 discusses how to handle more than two classes. In Section 3.2 we consider the case of a real-valued target variable. Section 3.3 is devoted to various forms of learning that are either unsupervised or aimed at learning descriptive models.
Handling more than two classes
Certain concepts are fundamentally binary. For instance, the notion of a coverage curve does not easily generalise to more than two classes. We will now consider general issues related to having more than two classes in classification, scoring and class probability estimation. The discussion will address two issues: how to evaluate multi-class performance, and how to build multi-class models out of binary models. The latter is necessary for some models, such as linear classifiers, that are primarily designed to separate two classes. Other models, including decision trees, handle any number of classes quite naturally.
Multi-class classification
Classification tasks with more than two classes are very common. For instance, once a patient has been diagnosed as suffering from a rheumatic disease, the doctor will want to classify him or her further into one of several variants. If we have k classes, performance of a classifier can be assessed using a k-by-k contingency table. Assessing performance is easy if we are interested in the classifier's accuracy, which is still the sum of the descending diagonal of the contingency table, divided by the number of test instances.

2 - Binary classification and related tasks
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 49-80
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

IN THIS CHAPTER and the next we take a bird's-eye view of the wide range of different tasks that can be solved with machine learning techniques. ‘Task’ here refers to whatever it is that machine learning is intended to improve performance of (recall the definition of machine learning on p.3), for example, e-mail spam recognition. Since this is a classification task, we need to learn an appropriate classifier from training data. Many different types of classifiers exist: linear classifiers, Bayesian classifiers, distancebased classifiers, to name a few. We will refer to these different types as models; they are the subject of Chapters 4–9. Classification is just one of a range of possible tasks for which we can learn a model: other tasks that will pass the review in this chapter are class probability estimation and ranking. In the next chapter we will discuss regression, clustering and descriptive modelling. For each of these tasks we will discuss what it is, what variants exist, how performance at the task could be assessed, and how it relates to other tasks. We will start with some general notation that is used in this chapter and throughout the book (see Background 2.1 for the relevant mathematical concepts).
The objects of interest in machine learning are usually referred to as instances. The set of all possible instances is called the instance space, denoted in this book.

12 - Machine learning experiments
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 343-359
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

MACHING LEARNING IS a practical subject as much as a computational one. While we may be able to prove that a particular learning algorithm converges to the theoretically optimal model under certain assumptions, we need actual data to investigate, e.g., the extent to which those assumptions are actually satisfied in the domain under consideration, or whether convergence happens quickly enough to be of practical use. We thus evaluate or run particular models or learning algorithms on one or more data sets, obtain a number of measurements and use these to answer particular questions we might be interested in. This broadly characterises what is known as machine learning experiments.
In the natural sciences, an experiment can be seen as a question to nature about a scientific theory. For example, Arthur Eddington's famous 1919 experiment to verify Einstein's theory of general relativity asked the question: Are rays of light bent by gravitational fields produced by large celestial objects such as the Sun? To answer this question, the perceived position of stars was recorded under several conditions including a total solar eclipse. Eddington was able to show that these measurements indeed differed to an extent unexplained by Newtonian physics but consistent with general relativity.
While you don't have to travel to the island of Príncipe to perform machine learning experiments, they bear some similarity to experiments in physics in that machine learning experiments pose questions about models that we try to answer by means of measurements on data.

Prologue: A machine learning sampler
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 1-12
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp vii-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Rule models
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 157-193
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

RULE MODELS ARE the second major type of logical machine learning models. Generally speaking, they offer more flexibility than tree models: for instance, while decision tree branches are mutually exclusive, the potential overlap of rules may give additional information. This flexibility comes at a price, however: while it is very tempting to view a rule as a single, independent piece of information, this is often not adequate because of the way the rules are learned. Particularly in supervised learning, a rule model is more than just a set of rules: the specification of how the rules are to be combined to form predictions is a crucial part of the model.
There are essentially two approaches to supervised rule learning. One is inspired by decision tree learning: find a combination of literals – the body of the rule, which is what we previously called a concept – that covers a sufficiently homogeneous set of examples, and find a label to put in the head of the rule. The second approach goes in the opposite direction: first select a class you want to learn, and then find rule bodies that cover (large subsets of) the examples of that class. The first approach naturally leads to a model consisting of an ordered sequence of rules – a rule list – as will be discussed in Section 6.1. The second approach treats collections of rules as unordered rule sets and is the topic of Section 6.2.

9 - Probabilistic models
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 262-297
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Important points to remember
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 363-366
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp xv-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This book started life in the Summer of 2008, when my employer, the University of Bristol, awarded me a one-year research fellowship. I decided to embark on writing a general introduction to machine learning, for two reasons. One was that there was scope for such a book, to complement the many more specialist texts that are available; the other was that through writing I would learn new things – after all, the best way to learn is to teach.
The challenge facing anyone attempting to write an introductory machine learn- ing text is to do justice to the incredible richness of the machine learning field without losing sight of its unifying principles. Put too much emphasis on the diversity of the discipline and you risk ending up with a ‘cookbook’ without much coherence; stress your favourite paradigm too much and you may leave out too much of the other in- teresting stuff. Partly through a process of trial and error, I arrived at the approach embodied in the book, which is is to emphasise both unity and diversity: unity by separate treatment of tasks and features, both of which are common across any machine learning approach but are often taken for granted; and diversity through coverage of a wide range of logical, geometric and probabilistic models.
Clearly, one cannot hope to cover all of machine learning to any reasonable depth within the confines of 400 pages.

References
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 367-382
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Distance-based models
Peter Flach, University of Bristol
Book:

Machine Learning

Published online:

05 November 2012

Print publication:

20 September 2012, pp 231-261
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2327 results in Pattern Recognition and Machine Learning

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Machine Learning

Index

Epilogue: Where to go from here

Summary

11 - Model ensembles

Summary

7 - Linear models

10 - Features

Frontmatter

5 - Tree models

Summary

3 - Beyond binary classification

Summary

2 - Binary classification and related tasks

Summary

12 - Machine learning experiments

Summary

Prologue: A machine learning sampler

Contents

6 - Rule models

Summary

9 - Probabilistic models

Important points to remember

Preface

Summary

References

8 - Distance-based models

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2327 results in Pattern Recognition and Machine Learning

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary