Skip to content
Open global navigation

Cambridge University Press

AcademicLocation selectorSearch toggleMain navigation toggle

Your Cart


You have 0 items in your cart.

Register Sign in Wishlist
Look Inside Mining of Massive Datasets

Mining of Massive Datasets

$69.00 (P)

  • Date Published: December 2011
  • availability: In stock
  • format: Hardback
  • isbn: 9781107015357

$69.00 (P)

Add to cart Add to wishlist

Looking for an examination copy?

If you are interested in the title for your course we can consider offering an examination copy. To register your interest please contact providing details of the course you are teaching.

Product filter button
About the Authors
  • The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

    • Teaching material has been 'road-tested' for several years at Stanford University
    • Includes a range of exercises to challenge even the most able student
    • Resources for instructors are available online, including slides, homework assignments, project requirements and exams
    Read more

    Customer reviews

    Not yet reviewed

    Be the first to review

    Review was not posted due to profanity


    , create a review

    (If you're not , sign out)

    Please enter the right captcha value
    Please enter a star rating.
    Your review must be a minimum of 12 words.

    How do you rate this item?


    Product details

    • Date Published: December 2011
    • format: Hardback
    • isbn: 9781107015357
    • length: 326 pages
    • dimensions: 253 x 178 x 23 mm
    • weight: 0.8kg
    • contains: 90 b/w illus. 160 exercises
    • availability: In stock
  • Table of Contents

    1. Data mining
    2. Large-scale file systems and map-reduce
    3. Finding similar items
    4. Mining data streams
    5. Link analysis
    6. Frequent itemsets
    7. Clustering
    8. Advertising on the Web
    9. Recommendation systems

  • Resources for

    Mining of Massive Datasets

    Anand Rajaraman, Jeffrey David Ullman

    General Resources

    Welcome to the resources site

    Here you will find free-of-charge online materials to accompany this book. The range of materials we provide across our academic and higher education titles are an integral part of the book package whether you are a student, instructor, researcher or professional.

    Find resources associated with this title

    Type Name Unlocked * Format Size

    Showing of

    Back to top

    *This title has one or more locked files and access is given only to instructors adopting the textbook for their class. We need to enforce this strictly so that solutions are not made available to students. To gain access to locked resources you either need first to sign in or register for an account.

    These resources are provided free of charge by Cambridge University Press with permission of the author of the corresponding work, but are subject to copyright. You are permitted to view, print and download these resources for your own personal use only, provided any copyright lines on the resources are not removed or altered in any way. Any other use, including but not limited to distribution of the resources in modified form, or via electronic or other media, is strictly prohibited unless you have permission from the author of the corresponding work and provided you give appropriate acknowledgement of the source.

    If you are having problems accessing these resources please email

  • Instructors have used or reviewed this title for the following courses

    • Analytical Thinking
    • Big Data Ecosystems
    • Bioinformatics Resources
    • Data Mining I
    • Data Mining, Search Engines and Distributed Databases
    • Database and Big Data Management Seminar
    • Decision Support and Knowledge Management Research
    • Intro to informatics
    • Mining of Massive Datasets
    • Modeling and Optimization Techniques
    • Physiology and Biophysics 500
    • Special Studies in CS: Mining Massive Data Sets
  • Authors

    Anand Rajaraman, WalmartLabs
    Anand Rajaraman is CEO of Kosmix Inc., a website which organizes the Internet by topic. He is also a consulting assistant professor in the Computer Science Department at Stanford University. In 1996, together with four other engineers, Rajaraman founded Junglee Corp., which pioneered Internet comparison shopping. It was acquired by Inc. in August 1998 for 1.6 million shares of stock valued at $250 million. Rajaraman went on to become Director of Technology at, where he was responsible for technology strategy. He helped launch the transformation of from a retailer into a retail platform, enabling third-party retailers to sell on's website. Third-party transactions now account for almost 25% of all US transactions, and represent Amazon's fastest-growing and most profitable business segment. Rajaraman was also an inventor of the concept underlying's Mechanical Turk. Rajaraman and his business partner, Venky Harinarayan, co-founded Cambrian Ventures, an early stage VC fund, in 2000. Cambrian went on to back several companies later acquired by Google and has funded companies like Mobissimo, Aster Data Systems and

    Jeffrey David Ullman, Stanford University, California
    Jeffrey David Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus) at Stanford University. He is also the CEO of Gradiance. Ullman's research interests include database theory, data integration, data mining and education using the information infrastructure. He is one of the founders of the field of database theory and was the doctoral advisor of an entire generation of students who later became leading database theorists in their own right. He was also the Ph.D. advisor of Sergey Brin, one of the co-founders of Google, and served on Google's technical advisory board. In 1995 he was inducted as a Fellow of the Association for Computing Machinery and in 2000 he was awarded the Knuth Prize. Ullman is also the co-recipient (with John Hopcroft) of the 2010 IEEE John von Neumann Medal, for 'laying the foundations for the fields of automata and language theory and many seminal contributions to theoretical computer science'.

Sign In

Please sign in to access your account


Not already registered? Create an account now. ×

Sorry, this resource is locked

Please register or sign in to request access. If you are having problems accessing these resources please email

Register Sign in
Please note that this file is password protected. You will be asked to input your password on the next screen.

» Proceed

You are now leaving the Cambridge University Press website. Your eBook purchase and download will be completed by our partner Please see the permission section of the catalogue page for details of the print & copy limits on our eBooks.

Continue ×

Continue ×

Continue ×

Find content that relates to you

Back to top

Are you sure you want to delete your account?

This cannot be undone.

Cancel Delete

Thank you for your feedback which will help us improve our service.

If you requested a response, we will make sure to get back to you shortly.

Please fill in the required fields in your feedback submission.