Published online by Cambridge University Press: 05 September 2016
Throughout the book we have covered a number of compact data structures that are well established and tested, and their basic aspects can be considered stable. This chapter is conceived as an epilogue, where we take a look to the future. We describe some recent trends that, while not mature or general enough to deserve a thorough treatment in the book, are certainly promising and likely to be the focus of intense research in the upcoming years. Therefore, the chapter may also serve to guide the readers looking for hot research topics. Its writing style is different from the other chapters, as we discuss the bibliography together with the main material.
First, we consider encoding data structures. These ensure only that a certain set of queries of interest can be answered; the actual data cannot be recovered. Encodings can offer significant space reductions with respect to the data entropy and are interesting because they have the potential to reach the minimum space needed just to answer the desired queries; nothing superfluous is stored. They also define a new concept of entropy, in terms not only of a universe of objects, but also of a set of queries posed on them. Encodings for a few problems exist already, but they are very recent and mostly in a theoretical stage.
Second, we consider repetitive document collections. Many of the largest text collections arising in applications these days are highly repetitive, and thus very compressible if one applies the right compression methods. The compact data structures we have seen, however, focus on statistical compression, which is insensitive to this kind of compressibility. New text indexes building on other compression principles are being designed, although they are generally slower than the classical ones and still do not reach the exact entropy bounds as current statistical methods do.
Finally, we consider secondary memory. Compact data structures use less space than classical ones and thus may fit in smaller and faster memories. However, they have generally less locality of reference, thus they are slower than classical structures when competing in the same level of the memory hierarchy. This is particularly relevant when the datasets are huge and the structures must operate on disk, where the space usage is not so important and locality of reference is crucial.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.