Search results for Pattern Recognition and Machine Learning

7 - Clustering
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 243-281
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Clustering is the process of examining a collection of “points,” and grouping the points into “clusters” according to some distance measure. The goal is that points in the same cluster have a small distance from one another, while points in different clusters are at a large distance from one another. A suggestion of what clusters might look like was seen in Fig. 1.1. However, there the intent was that there were three clusters around three different road intersections, but two of the clusters blended into one another because they were not sufficiently separated. Our goal in this chapter is to offer methods for discovering clusters in data. We are particularly interested in situations where the data is very large, and/or where the space either is high-dimensional, or the space is not Euclidean at all. We shall therefore discuss several algorithms that assume the data does not fit in main memory. However, we begin with the basics: the two general approaches to clustering and the methods for dealing with clusters in a non-Euclidean space.

1 - Data Mining
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 1-19
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We begin with the essence of data mining and a discussion of how data mining is treated by the various disciplines that contribute to this field. We cover “Bonferroni’s Principle,” which is really a warning about overusing the ability to mine data. We also summarize a few useful ideas that are not data mining per se, but are useful in understanding some important data-mining concepts. These include the TF.IDF measure of word importance, behavior of hash functions and indexes, and identities involving e, the base of natural logarithms. Finally, we give an outline of the topics covered in the balance of the book.

3 - First-Order Optimization Techniques
from Part I - Mathematical Optimization
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 45-74
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Dimensionality Reduction
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 410-440
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we shall explore the idea of dimensionality reduction in more detail. We begin with a discussion of eigenvalues and their use in “principal component analysis” (PCA). We cover singular-value decomposition, a more powerful version of UV-decomposition. Finally, because we are always interested in the largest data sizes we can handle, we look at another form of decomposition, called CUR-decomposition, which is a variant of singular-value decomposition that keeps the matrices of the decomposition sparse if the original matrix is sparse.

Acknowledgements
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp xxii-xxii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp xii-xxi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

12 - Kernel Methods
from Part III - Nonlinear Learning
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 383-402
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Dedication
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp v-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Principles of Nonlinear Feature Engineering
from Part III - Nonlinear Learning
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 275-303
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

13 - Neural Nets and Deep Learning
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 498-543
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we shall consider the design of neural nets, which are collections of perceptrons, or nodes, where the outputs of one rank (or layer of nodes becomes the inputs to nodes at the next layer. The last layer of nodes produces the outputs of the entire neural net. The training of neural nets with many layers requires enormous numbers of training examples, but has proven to be an extremely powerful technique, referred to as deep learning, when it can be used.We also consider several specialized forms of neural nets that have proved useful for special kinds of data. These forms are characterized by requiring that certain sets of nodes in the network share the same weights. Since learning all the weights on all the inputs to all the nodes of the network is in general a hard and time-consuming task, these special forms of network greatly simplify the process of training the network to recognize the desired class or classes of inputs. We shall study convolutional neural networks (CNNs), which are specially designed to recognize classes of images. We shall also study recurrent neural networks (RNNs) and long short-term memory networks (LSTMs), which are designed to recognize classes of sequences, such as sentences (sequences of words).

Part IV - Appendices
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 471-472
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 569-574
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp vii-xi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Linear Multi-Class Classification
from Part II - Linear Learning
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 174-207
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp ix-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Advertising on the Web
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 282-306
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

By far the most lucrative venue for on-line advertising has been search, and much of the effectiveness of search advertising comes from the “adwords” model of matching search queries to advertisements. We shall therefore devote much of this chapter to algorithms for optimizing the way this assignment is done. The algorithms used are of an unusual type; they are greedy and they are “on-line” in a particular technical sense to be discussed. We shall therefore digress to discuss these two algorithmic issues – greediness and on-line algorithms – in general, before tackling the adwords problem. A second interesting on-line advertising problem involves selecting items to advertise at an on-line store. This problem involves “collaborative filtering,” where we try to find customers with similar behavior in order to suggest they buy things that similar customers have bought. This subject will be treated in Section 9.3.

Part II - Linear Learning
Jeremy Watt, Northwestern University, Illinois, Reza Borhani, Northwestern University, Illinois, Aggelos K. Katsaggelos, Northwestern University, Illinois
Book:

Machine Learning Refined

Published online:

05 February 2020

Print publication:

09 January 2020, pp 97-98
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 544-553
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Mining Social-Network Graphs
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 340-409
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we shall study techniques for analyzing social networks. An important question is how to identify “communities,” that is, subsets of the nodes (people or other entities that form the network) with unusually strong connections. Some of the techniques used to identify communities are similar to the clustering algorithms we discussed in Chapter 7. However, communities almost never partition the set of nodes in a network. Rather, communities usually overlap. For example, you may belong to several communities of friends or classmates. The people from one community tend to know each other, but people from two different communities rarely know each other. You would not want to be assigned to only one of the communities, nor would it make sense to cluster all the people from all your communities into one cluster. Also in this chapter we explore efficient algorithms for discovering other properties of graphs. We look at “simrank,” a way to discover similarities among nodes of a graph. We then explore triangle counting as a way to measure the connectedness of a community. In addition, we give efficient algorithms for exact and approximate measurement of the neighborhood sizes of nodes in a graph, and we look at efficient algorithms for computing the transitive closure.

6 - Frequent Itemsets
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 206-242
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We turn in this chapter to the discovery of frequent itemsets. To begin, we introduce the “market-basket” model of data, which is essentially a many-many relationship between two kinds of elements, called “items” and “baskets,” but with some assumptions about the shape of the data. The frequent-itemsets problem is that of finding sets of items that appear in (are related to) many of the same baskets. The problem of finding frequent itemsets differs from the similarity search discussed in Chapter 3. Here we are interested in the absolute number of baskets that contain a particular set of items. In Chapter 3 we wanted items that have a large fraction of their baskets in common, even if the absolute number of baskets is small. The difference leads to a new class of algorithms for finding frequent itemsets. We begin with the A-Priori Algorithm, which works by eliminating most large sets as candidates by looking first at smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. We then consider various improvements to the basic A-Priori idea, concentrating on very large data sets that stress the available main memory. Next, we consider approximate algorithms that work faster but are not guaranteed to find all frequent itemsets. Finally, we discuss briefly how to find frequent itemsets in a data stream.

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2514 results in Pattern Recognition and Machine Learning

7 - Clustering

Summary

1 - Data Mining

Summary

3 - First-Order Optimization Techniques

11 - Dimensionality Reduction

Summary

Acknowledgements

Preface

12 - Kernel Methods

Dedication

10 - Principles of Nonlinear Feature Engineering

13 - Neural Nets and Deep Learning

Summary

Part IV - Appendices

Index

Contents

7 - Linear Multi-Class Classification

Preface

8 - Advertising on the Web

Summary

Part II - Linear Learning

Index

10 - Mining Social-Network Graphs

Summary

6 - Frequent Itemsets

Summary

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2514 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary