Chapter Objectives
In this chapter, you will learn:
• what is meant by “Big Data” and its “5 Vs”;
• to understand the differences between traditional database management systems and Big Data technologies such as Hadoop;
• to understand the differences between traditional data warehousing approaches and Big Data technologies;
• to identify the tradeoffs when opting to adopt a Big Data stack;
• what the links are between Big Data and NoSQL databases.
Opening Scenario
Sober has made its first steps in setting up a data warehouse on top of its existing DBMS stack to kick-start its business intelligence activities, with the main goal of reporting purposes. However, Sober's management is hearing a lot lately about “Big Data” and “Hadoop” with regards to storing and analyzing huge amounts of data. Sober is wondering whether these technologies would offer an added benefit. So far, Sober is happy with its relational DBMS, which integrates nicely with its business intelligence tooling. The mobile development team, meanwhile, has been happy to work with MongoDB, a NoSQL database, to handle the increased workload from mobile users. Hence, the question for Sober is: what would a Big Data stack offer on top of this?
Data are everywhere. To give some staggering examples, IBM projects that every day we generate 2.5 quintillion bytes of data. Every minute, more than 300,000 tweets are created, Netflix subscribers are streaming more than 70,000 hours of video, Apple users download 30,000 apps, and Instagram users like almost two million photos. In relative terms, 90% of the data in the world have been created in the last two years. These massive amounts of data yield an unprecedented treasure of internal customer knowledge, ready to be analyzed using state-of-the-art analytical techniques to better understand and exploit customer behavior by identifying new business opportunities together with new strategies. In this chapter, we zoom into this concept of “Big Data” and explain how these huge data troves are changing the world of DBMSs. We kick-off by reviewing the 5 Vs of Big Data. Next, we zoom in on the common Big Data technologies in use today. We discuss Hadoop, SQL on Hadoop, and Apache Spark.
Review the options below to login to check your access.
Log in with your Cambridge Aspire website account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.