Skip to content
Register Sign in Wishlist

Mining of Massive Datasets

3rd Edition

$74.99 (P)

  • Date Published: February 2020
  • availability: In stock
  • format: Hardback
  • isbn: 9781108476348

$ 74.99 (P)

Add to cart Add to wishlist

Other available formats:

Request examination copy

Instructors may request a copy of this title for examination

Product filter button
About the Authors
  • Written by leading authorities in database and Web technologies, this book is essential reading for students and practitioners alike. The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the MapReduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream-processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets, and clustering. This third edition includes new and extended coverage on decision trees, deep learning, and mining social-network graphs.

    • Contains brand new material on deep learning, decision trees, and mining social-network graphs
    • Includes a range of more than 250 exercises to challenge even the most able student
    • Slides, homework assignments, project requirements, and exams are available from
    Read more

    Customer reviews

    Not yet reviewed

    Be the first to review

    Review was not posted due to profanity


    , create a review

    (If you're not , sign out)

    Please enter the right captcha value
    Please enter a star rating.
    Your review must be a minimum of 12 words.

    How do you rate this item?


    Product details

    • Edition: 3rd Edition
    • Date Published: February 2020
    • format: Hardback
    • isbn: 9781108476348
    • dimensions: 253 x 178 x 28 mm
    • weight: 1.24kg
    • contains: 76 b/w illus. 250 exercises
    • availability: In stock
  • Table of Contents

    1. Data mining
    2. MapReduce and the new software stack
    3. Finding similar items
    4. Mining data streams
    5. Link analysis
    6. Frequent itemsets
    7. Clustering
    8. Advertising on the web
    9. Recommendation systems
    10. Mining social-network graphs
    11. Dimensionality reduction
    12. Large-scale machine learning
    13. Neural nets and deep learning

  • Resources for

    Mining of Massive Datasets

    Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

    Find resources associated with this title

    Type Name Unlocked * Format Size

    Showing of

    Back to top

    This title is supported by one or more locked resources. Access to locked resources is granted exclusively by Cambridge University Press to instructors whose faculty status has been verified. To gain access to locked resources, instructors should sign in to or register for a Cambridge user account.

    Please use locked resources responsibly and exercise your professional discretion when choosing how you share these materials with your students. Other instructors may wish to use locked resources for assessment purposes and their usefulness is undermined when the source files (for example, solution manuals or test banks) are shared online or via social networks.

    Supplementary resources are subject to copyright. Instructors are permitted to view, print or download these resources for use in their teaching, but may not change them or use them for commercial gain.

    If you are having problems accessing these resources please contact

  • Authors

    Jure Leskovec, Stanford University, California
    Jure Leskovec is Associate Professor of Computer Science at Stanford University, California. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Problems he investigates are motivated by large-scale data, the Web, and on-line media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, an Okawa Foundation Fellowship, and numerous best paper awards. His research has also been featured in popular press outlets such as the New York Times, the Wall Street Journal, the Washington Post, MIT Technology Review, NBC, BBC, CBC, and Wired. Leskovec has authored the Stanford Network Analysis Platform (SNAP,, a general purpose network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes and billions of edges. He is also Investigator at the Chan Zuckerberg Biohub. You can follow him on Twitter at @jure.

    Anand Rajaraman, Rocketship VC
    Anand Rajaraman is a serial entrepreneur, venture capitalist, and academic based in Silicon Valley. He is a Founding Partner at Rocketship VC, an innovative venture capital firm that uses data mining and machine learning to find promising startup investments all over the world. Rajaraman's investments include Facebook (one of the earliest angel investors in 2005), Lyft, Aster Data Systems (acquired by Teradata), Efficient Frontier (acquired by Adobe), Neoteris (acquired by Juniper), Transformic (acquired by Google), and several others. Rajaraman was, until recently, Senior Vice President at Walmart Global eCommerce and co-head of @WalmartLabs, where he worked at the intersection of social, mobile, and commerce. He came to Walmart when Walmart acquired Kosmix, the startup he co-founded, in 2011. Kosmix pioneered semantic search technology and semantic analysis of social media. In 1996, Rajaraman co-founded Junglee, an e-commerce pioneer. As Chief Technology Officer, he played a key role in developing Junglee's award-winning Virtual Database technology. In 1998, acquired Junglee, and Rajaraman helped launch the transformation of from a retailer into a retail platform, enabling third-party retailers to sell on's website. He is also a co-inventor of Amazon Mechanical Turk, which pioneered the concepts of crowdsourcing and hybrid Human-Machine computation. As an academic, his research has focused at the intersection of database systems, the Web, and social media. His research publications have won several awards at prestigious academic conferences, including two retrospective 10-year Best Paper awards at ACM SIGMOD and VLDB. In 2012, Fast Company magazine named Rajaraman in its list of '100 Most Creative People in Business'. In 2013, he was named a Distinguished Alumnus by his alma mater, IIT Madras. In addition to acting as a consulting assistant professor in the Computer Science Department at Stanford University, California, he is a spe

    Jeffrey David Ullman, Stanford University, California
    Jeffrey David Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus) and the current CEO of Gradiance. His research interests include database theory, data mining, and education using the information infrastructure. He is one of the founders of the field of database theory, and was the doctoral advisor of an entire generation of students who later became leading database theorists in their own right. He was the Ph.D. advisor of Sergey Brin, one of the co-founders of Google, and served on Google's technical advisory board. Ullman was elected to the National Academy of Engineering in 1989, the American Academy of Arts and Sciences in 2012, and he has held Guggenheim and Einstein Fellowships. He has received awards including the Knuth Prize (2000), the Sigmod E. F. Codd Innovations award (2006),and the 2016 NEC C&C Foundation Prize (with Al Aho and John Hopcroft). Ullman is also the co-recipient (with John Hopcroft) of the 2010 IEEE John von Neumann Medal, for 'laying the foundations for the fields of automata and language theory and many seminal contributions to theoretical computer science'.

Sign In

Please sign in to access your account


Not already registered? Create an account now. ×

Sorry, this resource is locked

Please register or sign in to request access. If you are having problems accessing these resources please email

Register Sign in
Please note that this file is password protected. You will be asked to input your password on the next screen.

» Proceed

You are now leaving the Cambridge University Press website. Your eBook purchase and download will be completed by our partner Please see the permission section of the catalogue page for details of the print & copy limits on our eBooks.

Continue ×

Continue ×

Continue ×

Find content that relates to you

Join us online

This site uses cookies to improve your experience. Read more Close

Are you sure you want to delete your account?

This cannot be undone.


Thank you for your feedback which will help us improve our service.

If you requested a response, we will make sure to get back to you shortly.

Please fill in the required fields in your feedback submission.