Large-Scale Machine Learning

Jure Leskovec; Anand Rajaraman; Jeffrey David Ullman

doi:10.1017/CBO9781139924801.013

12 - Large-Scale Machine Learning

Published online by Cambridge University Press: 05 December 2014

Jure Leskovec ,

Anand Rajaraman and

Jeffrey David Ullman

Show author details

Jure Leskovec: Affiliation:
Stanford University, California
Anand Rajaraman: Affiliation:
Milliways Laboratories, California
Jeffrey David Ullman: Affiliation:
Stanford University, California

Book contents

Get access

Summary

Many algorithms are today classified as “machine learning.” These algorithms share, with the other algorithms studied in this book, the goal of extracting information from data. All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made. Among many examples, the frequent-itemset analysis that we did in Chapter 6 produces information like association rules, which can then be used for planning a sales strategy or for many other purposes.

However, algorithms called “machine learning” not only summarize our data; they are perceived as learning a model or classifier from the data, and thus discover something about data that will be seen in the future. For instance, the clustering algorithms discussed in Chapter 7 produce clusters that not only tell us something about the data being analyzed (the training set), but they allow us to classify future data into one of the clusters that result from the clustering algorithm. Thus, machine-learning enthusiasts often speak of clustering with the neologism “unsupervised learning”; the term unsupervised refers to the fact that the input data does not tell the clustering algorithm what the clusters should be. In supervised machine learning, which is the subject of this chapter, the available data includes information about the correct way to classify at least some of the data. The data already classified is called the training set.

In this chapter, we do not attempt to cover all the different approaches to machine learning. We concentrate on methods that are suitable for very large data and that have the potential for parallel implementation. We consider the classical “perceptron” approach to learning a data classifier, where a hyperplane that separates two classes is sought. Then, we look at more modern techniques involving support-vector machines. Similar to perceptrons, these methods look for hyperplanes that best divide the classes, so that few, if any, members of the training set lie close to the hyperplane. We end with a discussion of nearest-neighbor techniques, where data is classified according to the class(es) of their nearest neighbors in some space.

Information

Type: Chapter
Information: Mining of Massive Datasets , pp. 415 - 458

DOI: https://doi.org/10.1017/CBO9781139924801.013 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

[1] A., Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain,” Machine Learning 26 (1997), pp. 5–23.Google Scholar

[2] L., Bottou, “Large-scale machine learning with stochastic gradient descent,” Proc. 19th Intl. Conf. on Computational Statistics (2010), pp. 177–187, Springer.Google Scholar

[3] L., Bottou, “Stochastic gradient tricks, neural networks,” in Tricks of the Trade, Reloaded, pp. 430–445, edited by G., Montavon, G.B., Orr and K.-R., Mueller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.Google Scholar

[4] C.J.C., Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery 2 (1998), pp. 121–167.Google Scholar

[5] N., Cristianini and J., Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000.Google Scholar

[6] C., Cortes and V.N., Vapnik, “Support-vector networks,” Machine Learning 20 (1995), pp. 273–297.Google Scholar

[7] Y., Freund and R.E., Schapire, “Large margin classification using the perceptron algorithm,” Machine Learning 37 (1999), pp. 277–296.Google Scholar

[8] T., Joachims, “Training linear SVMs in linear time.” Proc. 12th ACM SIG-KDD (2006), pp. 217–226.Google Scholar

[9] N., Littlestone, “Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm,” Machine Learning 2 (1988), pp. 285–318.Google Scholar

[10] M., Minsky and S., Papert, Perceptrons: An Introduction to Computational Geometry (2nd edition), MIT Press, Cambridge MA, 1972.Google Scholar

[11] F., Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review 65:6 (1958), pp. 386–408.Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.