Search results for Pattern Recognition and Machine Learning

29 - Multiclass Learnability
from Part 4 - Advanced Theory
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 351-358
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

17 - Multiclass, Ranking, and Complex Prediction Problems
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 190-211
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

22 - Clustering
from Part 3 - Additional Learning Models
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 264-277
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Clustering is one of the most widely used techniques for exploratory data analysis. Across all disciplines, from social sciences to biology to computer science, people try to get a first intuition about their data by identifying meaningful groups among the data points. For example, computational biologists cluster genes on the basis of similarities in their expression in different experiments; retailers cluster customers, on the basis of their customer profiles, for the purpose of targeted marketing; and astronomers cluster stars on the basis of their spacial proximity.
The first point that one should clarify is, naturally, what is clustering? Intuitively, clustering is the task of grouping a set of objects such that similar objects end up in the same group and dissimilar objects are separated into different groups. Clearly, this description is quite imprecise and possibly ambiguous. Quite surprisingly, it is not at all clear how to come up with a more rigorous definition.
There are several sources for this difficulty. One basic problem is that the two objectives mentioned in the earlier statement may in many cases contradict each other. Mathematically speaking, similarity (or proximity) is not a transitive relation, while cluster sharing is an equivalence relation and, in particular, it is a transitive relation. More concretely, it may be the case that there is a long sequence of objects, x1, …, xm such that each xi is very similar to its two neighbors, xi−1 and xi+1, but x1 and xm are very dissimilar.

20 - Neural Networks
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 228-242
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

An artificial neural network is a model of computation inspired by the structure of neural networks in the brain. In simplified models of the brain, it consists of a large number of basic computing devices (neurons) that are connected to each other in a complex communication network, through which the brain is able to carry out highly complex computations. Artificial neural networks are formal computation constructs that are modeled after this computation paradigm.
Learning with neural networks was proposed in the mid-20th century. It yields an effective learning paradigm and has recently been shown to achieve cutting-edge performance on several learning tasks.
A neural network can be described as a directed graph whose nodes correspond to neurons and edges correspond to links between them. Each neuron receives as input a weighted sum of the outputs of the neurons connected to its incoming edges. We focus on feedforward networks in which the underlying graph does not contain cycles.
In the context of learning, we can define a hypothesis class consisting of neural network predictors, where all the hypotheses share the underlying graph structure of the network and differ in the weights over edges. As we will show in Section 20.3, every predictor over n variables that can be implemented in time T(n) can also be expressed as a neural network predictor of size O(T(n)2), where the size of the network is the number of nodes in it.

Appendix C - Linear Algebra
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 380-384
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - The Bias-Complexity Trade-off
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 36-42
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

15 - Support Vector Machines
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 167-178
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

24 - Generative Models
from Part 3 - Additional Learning Models
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 295-308
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We started this book with a distribution free learning framework; namely, we did not impose any assumptions on the underlying distribution over the data. Furthermore, we followed a discriminative approach in which our goal is not to learn the underlying distribution but rather to learn an accurate predictor. In this chapter we describe a generative approach, in which it is assumed that the underlying distribution over the data has a specific parametric form and our goal is to estimate the parameters of the model. This task is called parametric density estimation.
The discriminative approach has the advantage of directly optimizing the quantity of interest (the prediction accuracy) instead of learning the underlying distribution. This was phrased as follows by Vladimir Vapnik in his principle for solving problems using a restricted amount of information:
When solving a given problem, try to avoid a more general problem as an intermediate step.
Of course, if we succeed in learning the underlying distribution accurately, we are considered to be “experts” in the sense that we can predict by using the Bayes optimal classifier. The problem is that it is usually more difficult to learn the underlying distribution than to learn an accurate predictor. However, in some situations, it is reasonable to adopt the generative learning approach. For example, sometimes it is easier (computationally) to estimate the parameters of the model than to learn a discriminative predictor.

21 - Online Learning
from Part 3 - Additional Learning Models
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 245-263
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we describe a different model of learning, which is called online learning. Previously, we studied the PAC learning model, in which the learner first receives a batch of training examples, uses the training set to learn a hypothesis, and only when learning is completed uses the learned hypothesis for predicting the label of new examples. In our papayas learning problem, this means that we should first buy a bunch of papayas and taste them all. Then, we use all of this information to learn a prediction rule that determines the taste of new papayas. In contrast, in online learning there is no separation between a training phase and a prediction phase. Instead, each time we buy a papaya, it is first considered a test example since we should predict whether it is going to taste good. Then, after taking a bite from the papaya, we know the true label, and the same papaya can be used as a training example that can help us improve our prediction mechanism for future papayas.
Concretely, online learning takes place in a sequence of consecutive rounds. On each online round, the learner first receives an instance (the learner buys a papaya and knows its shape and color, which form the instance). Then, the learner is required to predict a label (is the papaya tasty?). At the end of the round, the learner obtains the correct label (he tastes the papaya and then knows whether it is tasty or not).

3 - A Formal Learning Model
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 22-30
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix B - Measure Concentration
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 372-379
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Learning via Uniform Convergence
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 31-35
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Nonuniform Learnability
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 58-72
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The notions of PAC learnability discussed so far in the book allow the sample sizes to depend on the accuracy and confidence parameters, but they are uniform with respect to the labeling rule and the underlying data distribution. Consequently, classes that are learnable in that respect are limited (they must have a finite VC-dimension, as stated by Theorem 6.7). In this chapter we consider more relaxed, weaker notions of learnability. We discuss the usefulness of such notions and provide characterization of the concept classes that are learnable using these definitions.
We begin this discussion by defining a notion of “nonuniform learnability” that allows the sample size to depend on the hypothesis to which the learner is compared. We then provide a characterization of nonuniform learnability and show that nonuniform learnability is a strict relaxation of agnostic PAC learnability. We also show that a sufficient condition for nonuniform learnability is that H is a countable union of hypothesis classes, each of which enjoys the uniform convergence property. These results will be proved in Section 7.2 by introducing a new learning paradigm, which is called Structural Risk Minimization (SRM). In Section 7.3 we specify the SRM paradigm for countable hypothesis classes, which yields the Minimum Description Length (MDL) paradigm. The MDL paradigm gives a formal justification to a philosophical principle of induction called Occam's razor. Next, in Section 7.4 we introduce consistency as an even weaker notion of learnability.

Preface
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp xv-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The term machine learning refers to the automated detection of meaningful patterns in data. In the past couple of decades it has become a common tool in almost any task that requires information extraction from large data sets. We are surrounded by a machine learning–based technology: Search engines learn how to bring us the best results (while placing profitable ads), antispam software learns to filter our email messages, and credit card transactions are secured by a software that learns how to detect frauds. Digital cameras learn to detect faces and intelligent personal assistance applications on smart-phones learn to recognize voice commands. Cars are equipped with accident-prevention systems that are built using machine learning algorithms. Machine learning is also widely used in scientific applications such as bioinformatics, medicine, and astronomy.
One common feature of all of these applications is that, in contrast to more traditional uses of computers, in these cases, due to the complexity of the patterns that need to be detected, a human programmer cannot provide an explicit, fine-detailed specification of how such tasks should be executed. Taking examples from intelligent beings, many of our skills are acquired or refined through learning from our experience (rather than following explicit instructions given to us). Machine learning tools are concerned with endowing programs with the ability to “learn” and adapt.
The first goal of this book is to provide a rigorous, yet easy-to-follow, introduction to the main concepts underlying machine learning: What is learning?

16 - Kernel Methods
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 179-189
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the previous chapter we described the SVM paradigm for learning halfspaces in high dimensional feature spaces. This enables us to enrich the expressive power of halfspaces by first mapping the data into a high dimensional feature space, and then learning a linear predictor in that space. This is similar to the AdaBoost algorithm, which learns a composition of a halfspace over base hypotheses. While this approach greatly extends the expressiveness of halfspace predictors, it raises both sample complexity and computational complexity challenges. In the previous chapter we tackled the sample complexity issue using the concept of margin. In this chapter we tackle the computational complexity challenge using the method of kernels.
We start the chapter by describing the idea of embedding the data into a high dimensional feature space. We then introduce the idea of kernels. A kernel is a type of a similarity measure between instances. The special property of kernel similarities is that they can be viewed as inner products in some Hilbert space (or Euclidean space of some high dimension) to which the instance space is virtually embedded. We introduce the “kernel trick” that enables computationally efficient implementation of learning, without explicitly handling the high dimensional representation of the domain instances. Kernel based learning algorithms, and in particular kernel-SVM, are very useful and popular machine learning tools. Their success may be attributed both to being flexible for accommodating domain specific prior knowledge and to having a well developed set of efficient implementation algorithms.

Index
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 395-397
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

30 - Compression Bounds
from Part 4 - Advanced Theory
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 359-363
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

12 - Convex Learning Problems
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 124-136
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we introduce convex learning problems. Convex learning comprises an important family of learning problems, mainly because most of what we can learn efficiently falls into it. We have already encountered linear regression with the squared loss and logistic regression, which are convex problems, and indeed they can be learned efficiently. We have also seen nonconvex problems, such as halfspaces with the 0-1 loss, which is known to be computationally hard to learn in the unrealizable case.
In general, a convex learning problem is a problem whose hypothesis class is a convex set, and whose loss function is a convex function for each example. We begin the chapter with some required definitions of convexity. Besides convexity, we will define Lipschitzness and smoothness, which are additional properties of the loss function that facilitate successful learning. We next turn to defining convex learning problems and demonstrate the necessity for further constraints such as Boundedness and Lipschitzness or Smoothness. We define these more restricted families of learning problems and claim that Convex-Smooth/Lipschitz-Bounded problems are learnable. These claims will be proven in the next two chapters, in which we will present two learning paradigms that successfully learn all problems that are either convex-Lipschitz-bounded or convex-smooth-bounded.
Finally, in Section 12.3, we show how one can handle some nonconvex problems by minimizing “surrogate” loss functions that are convex (instead of the original nonconvex loss function). Surrogate convex loss functions give rise to efficient solutions but might increase the risk of the learned predictor.

10 - Boosting
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 101-113
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Boosting is an algorithmic paradigm that grew out of a theoretical question and became a very practical machine learning tool. The boosting approach uses a generalization of linear predictors to address two major issues that have been raised earlier in the book. The first is the bias-complexity tradeoff. We have seen (in Chapter 5) that the error of an ERM learner can be decomposed into a sum of approximation error and estimation error. The more expressive the hypothesis class the learner is searching over, the smaller the approximation error is, but the larger the estimation error becomes. A learner is thus faced with the problem of picking a good tradeoff between these two considerations. The boosting paradigm allows the learner to have smooth control over this tradeoff. The learning starts with a basic class (that might have a large approximation error), and as it progresses the class that the predictor may belong to grows richer.
The second issue that boosting addresses is the computational complexity of learning. As seen in Chapter 8, for many interesting concept classes the task of finding an ERM hypothesis may be computationally infeasible. A boosting algorithm amplifies the accuracy of weak learners. Intuitively, one can think of a weak learner as an algorithm that uses a simple “rule of thumb” to output a hypothesis that comes from an easy-to-learn hypothesis class and performs just slightly better than a random guess.

19 - Nearest Neighbor
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 219-227
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2327 results in Pattern Recognition and Machine Learning

29 - Multiclass Learnability

17 - Multiclass, Ranking, and Complex Prediction Problems

22 - Clustering

Summary

20 - Neural Networks

Summary

Appendix C - Linear Algebra

5 - The Bias-Complexity Trade-off

15 - Support Vector Machines

24 - Generative Models

Summary

21 - Online Learning

Summary

3 - A Formal Learning Model

Appendix B - Measure Concentration

4 - Learning via Uniform Convergence

7 - Nonuniform Learnability

Summary

Preface

Summary

16 - Kernel Methods

Summary

Index

30 - Compression Bounds

12 - Convex Learning Problems

Summary

10 - Boosting

Summary

19 - Nearest Neighbor

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2327 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary