Mining of Massive Datasets

Jure Leskovec; Anand Rajaraman; Jeffrey David Ullman

doi:10.1017/9781108684163

Chapter 7: Clustering

pp. 243-281

Jure Leskovec

, Stanford University, California,

Anand Rajaraman

, Rocketship VC,

Jeffrey David Ullman

, Stanford University, California

Get access

Add bookmark
Cite
Share

Summary

Clustering is the process of examining a collection of “points,” and grouping the points into “clusters” according to some distance measure. The goal is that points in the same cluster have a small distance from one another, while points in different clusters are at a large distance from one another. A suggestion of what clusters might look like was seen in Fig. 1.1. However, there the intent was that there were three clusters around three different road intersections, but two of the clusters blended into one another because they were not sufficiently separated. Our goal in this chapter is to offer methods for discovering clusters in data. We are particularly interested in situations where the data is very large, and/or where the space either is high-dimensional, or the space is not Euclidean at all. We shall therefore discuss several algorithms that assume the data does not fit in main memory. However, we begin with the basics: the two general approaches to clustering and the methods for dealing with clusters in a non-Euclidean space.

Keywords

clustering
distance measure
hierarchical clustering
<span class='italic'>k</span>-means algorithm
non-Euclidean clustering

About the book

Chapter DOI https://doi.org/10.1017/9781108684163.008
Book DOI https://doi.org/10.1017/9781108684163
Subjects Computer Science,Data Science, Databases, Data Mining, and Information Retrieval,Machine Learning and Pattern Recognition
Format: Hardback
- Publication date: 13 February 2020
- ISBN: 9781108476348
Format: Digital
- Publication date: 16 April 2020
- ISBN: 9781108684163
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$89.00

Hardback

US$89.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers