To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter discusses the limitations incurred by the sorters of atomic items when applied to sort variable-length items (aka strings). It then introduces a simple, yet effective comparison-based lower bound, which is eventually matched by means of an elegant variant of QuickSort, named Multi-key QuickSort, properly designed to deal with strings. The structure of this string sorter will also allow us to introduce an interesting, powerful, and dynamic data structure for string indexing, the ternary search tree, which supports efficient prefix searches over a dynamic string dictionary that fits in the internal memory of a computer. The case of large string dictionaries that cannot be fit into the internal memory of a computer is discussed in Chapter 9.
The present and following chapter extend the treatment of the dictionary problem to the case of more sophisticated forms of key matching, namely prefix match and substring match between a variable-length pattern string and all strings of an input dictionary. In particular, this chapter addresses the former problem, which occurs in many real-life applications concerned, first and foremost, with key-value stores and search engines. Discussion starts with very simple array-based solutions for internal and external memory (i.e. disks), and then moves to evaluate their time , space, and I/O complexities, which motivates the introduction of more advanced solutions for string compression (i.e. front coding and locality-preserving front coding), and data-structure design for prefix string search (i.e. compacted tries and Patricia tries). The chapter is concluded with a discussion on the management of dynamic and very large string dictionaries, which leads to the description of String B-trees. As for all previous chapters, the algorithmic discussion is enriched with pseudocodes, illustrative figures, and many running examples.
This chapter deals with the design of compressed data structures, an algorithmic field born just 30 years ago which now offers plenty of compressed solutions for most, if not all, classic data structures such as arrays, trees, and graphs. This last chapter aims at providing just an idea about these novel approaches to data structure design, by discussing the ones that we consider the most significant and fruitful, from an educational point of view. A side effect of this discussion will be the introduction of the paradigm called “pointerless programming,” which waives the explicit use of pointers (and thus integer offsets of four–eight bytes to index arbitrary items, such as strings, nodes, or edges) and instead uses compressed data structures built upon proper binary arrays that efficiently subsume the pointers, and support efficiently/optimally some interesting query operations over them.
This chapter deals with the design of data structures and algorithms for the substring search problem, which occurs mainly in computational biology and textual database applications to date. Most of the chapter is devoted to describing the two main data-structure champions in this context, the suffix array and the suffix tree. Several pseudocodes and illustrative examples enrich this discussion, which is accompanied by the evaluation of time, space, and I/O complexities incurred by their construction and by the execution of some powerful query operations. In particular, the chapter deals with the efficient/optimal construction of large suffix arrays in external memory, hence describing the DC3 algorithm and the I/O-efficient scan-based algorithm proposed by Gonnet, Baeza-Yates, and Snider, and the efficient direct construction of suffix trees, via McCreight’s algorithm, or via suffix arrays and LCP arrays. It will also detail the elegant construction of this latter array in internal memory, which is fundamental for several text-mining applications, some of which are described at the end of the chapter.