To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter revisits the classic sorting problem within the context of big inputs, where “Atomic” in the title refers to the fact that items occupy few memory words and are managed in their entirety by executing only comparisons. It discusses two classic sorting paradigms: the merge-based paradigm, which underlies the design of MergeSort, and the distribution-based paradigm, which underlies the design of QuickSort. It shows how to adapt them to work in a hierarchical memory setting, analyzes their I/O complexity, and finally proposes some useful algorithmic tools that allow us to speed up their execution in practice, such as the Snow-Plow technique and data compression. It also proves that these adaptations are I/O optimal in the two-level memory model by providing a sophisticated, yet very informative, lower bound.These results allow us to relate the sorting problem to the so-called permuting problem, typically neglected when dealing with sorting in the RAM model, and then argue an interesting I/O-complexity equivalence between these two problems which provides a mathematical ground for the ubiquitous use of sorters when designing I/O-efficient solutions for big data problems.
This chapter discusses the limitations incurred by the sorters of atomic items when applied to sort variable-length items (aka strings). It then introduces a simple, yet effective comparison-based lower bound, which is eventually matched by means of an elegant variant of QuickSort, named Multi-key QuickSort, properly designed to deal with strings. The structure of this string sorter will also allow us to introduce an interesting, powerful, and dynamic data structure for string indexing, the ternary search tree, which supports efficient prefix searches over a dynamic string dictionary that fits in the internal memory of a computer. The case of large string dictionaries that cannot be fit into the internal memory of a computer is discussed in Chapter 9.
Given two infinite sets, is there a sensible way to decide which one is larger? For instance, if A is the set of even integers, and B is the interval [0, 1], is there a way to compare their sizes? In this section, we focus on such questions and introduce the notion of cardinality, which is used to describe “how many elements” a (potentially infinite) set has. This leads to some interesting and counterintuitive consequences. However, we must first prepare the ground by discussing injections, surjections, bijections, and related results.
In this chapter, we discuss relations, a central notion in mathematics. As we will shortly see, we have already encountered many mathematical relations without using this terminology. We begin by formally defining what a relation is and then introduce a special type of relation – equivalence relations and the associated notion of an equivalence classes. In Section 7.4, we study an important and useful equivalence relation: congruence modulo n.
The present and following chapter extend the treatment of the dictionary problem to the case of more sophisticated forms of key matching, namely prefix match and substring match between a variable-length pattern string and all strings of an input dictionary. In particular, this chapter addresses the former problem, which occurs in many real-life applications concerned, first and foremost, with key-value stores and search engines. Discussion starts with very simple array-based solutions for internal and external memory (i.e. disks), and then moves to evaluate their time , space, and I/O complexities, which motivates the introduction of more advanced solutions for string compression (i.e. front coding and locality-preserving front coding), and data-structure design for prefix string search (i.e. compacted tries and Patricia tries). The chapter is concluded with a discussion on the management of dynamic and very large string dictionaries, which leads to the description of String B-trees. As for all previous chapters, the algorithmic discussion is enriched with pseudocodes, illustrative figures, and many running examples.
This chapter deals with the design of compressed data structures, an algorithmic field born just 30 years ago which now offers plenty of compressed solutions for most, if not all, classic data structures such as arrays, trees, and graphs. This last chapter aims at providing just an idea about these novel approaches to data structure design, by discussing the ones that we consider the most significant and fruitful, from an educational point of view. A side effect of this discussion will be the introduction of the paradigm called “pointerless programming,” which waives the explicit use of pointers (and thus integer offsets of four–eight bytes to index arbitrary items, such as strings, nodes, or edges) and instead uses compressed data structures built upon proper binary arrays that efficiently subsume the pointers, and support efficiently/optimally some interesting query operations over them.
Real analysis is a branch of mathematics focusing on the study of real numbers and related objects. Sets of real numbers, sequences, functions, and series of real numbers are at the core of the subject. The notions of limits and convergence are central in analysis and are used to investigate such objects. Learning real analysis means, in part, deepening our understanding and studying the theoretical foundations of calculus topics. For these reasons, many view real analysis as a rigorous version of calculus. In this chapter, we look at how limits of sequences and function can be formally defined. The precise definitions may require some effort to grasp, but it is absolutely essential for advanced studies in mathematics and related fields. Formal definitions of limits allow us to not only prove various statements (such as the Extreme and the Intermediate Value Theorems, often proved in a Real Analysis course), but also investigate more complicated functions and sequences. Our experience with proof writing and logical statements will be invaluable for our discussion. We also highlight the use of limits to defining continuity and differentiability of functions.