Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Chapter 19: Big Data

Chapter 19: Big Data

pp. 626-663

Authors

, KU Leuven, Belgium, , KU Leuven, Belgium, , KU Leuven, Belgium
Resources available Unlock the full potential of this textbook with additional resources. There are free resources and Instructor restricted resources available for this textbook. Explore resources
  • Add bookmark
  • Cite
  • Share

Summary

Chapter Objectives

In this chapter, you will learn:

  • • what is meant by “Big Data” and its “5 Vs”;

  • • to understand the differences between traditional database management systems and Big Data technologies such as Hadoop;

  • • to understand the differences between traditional data warehousing approaches and Big Data technologies;

  • • to identify the tradeoffs when opting to adopt a Big Data stack;

  • • what the links are between Big Data and NoSQL databases.

  • Opening Scenario

    Sober has made its first steps in setting up a data warehouse on top of its existing DBMS stack to kick-start its business intelligence activities, with the main goal of reporting purposes. However, Sober's management is hearing a lot lately about “Big Data” and “Hadoop” with regards to storing and analyzing huge amounts of data. Sober is wondering whether these technologies would offer an added benefit. So far, Sober is happy with its relational DBMS, which integrates nicely with its business intelligence tooling. The mobile development team, meanwhile, has been happy to work with MongoDB, a NoSQL database, to handle the increased workload from mobile users. Hence, the question for Sober is: what would a Big Data stack offer on top of this?

    Data are everywhere. To give some staggering examples, IBM projects that every day we generate 2.5 quintillion bytes of data. Every minute, more than 300,000 tweets are created, Netflix subscribers are streaming more than 70,000 hours of video, Apple users download 30,000 apps, and Instagram users like almost two million photos. In relative terms, 90% of the data in the world have been created in the last two years. These massive amounts of data yield an unprecedented treasure of internal customer knowledge, ready to be analyzed using state-of-the-art analytical techniques to better understand and exploit customer behavior by identifying new business opportunities together with new strategies. In this chapter, we zoom into this concept of “Big Data” and explain how these huge data troves are changing the world of DBMSs. We kick-off by reviewing the 5 Vs of Big Data. Next, we zoom in on the common Big Data technologies in use today. We discuss Hadoop, SQL on Hadoop, and Apache Spark.

    About the book

    Access options

    Review the options below to login to check your access.

    Purchase options

    eTextbook
    US$84.00
    Hardback
    US$84.00

    Have an access code?

    To redeem an access code, please log in with your personal login.

    If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

    Also available to purchase from these educational ebook suppliers