Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-05-20T03:06:17.542Z Has data issue: false hasContentIssue false

1 - Introduction

Published online by Cambridge University Press:  05 September 2016

Gonzalo Navarro
Affiliation:
Universidad de Chile
Get access

Summary

Why Compact Data Structures?

Google's stated mission, “to organize the world's information and make it universally accessible and useful,” could not better capture the immense ambition of modern society for gathering all kinds of data and putting them to use to improve our lives. We are collecting not only huge amounts of data from the physical world (astronomical, climatological, geographical, biological), but also human-generated data (voice, pictures, music, video, books, news, Web contents, emails, blogs, tweets) and society-based behavioral data (markets, shopping, traffic, clicks, Web navigation, likes, friendship networks).

Our hunger for more and more information is flooding our lives with data. Technology is improving and our ability to store data is growing fast, but the data we are collecting also grow fast – in many cases faster than our storage capacities. While our ability to store the data in secondary or perhaps tertiary storage does not yet seem to be compromised, performing the desired processing of these data in the main memory of computers is becoming more and more difficult. Since accessing a datum in main memory is about 105 times faster than on disk, operating in main memory is crucial for carrying out many data-processing applications.

In many cases, the problem is not so much the size of the actual data, but that of the data structures that must be built on the data in order to efficiently carry out the desired processing or queries. In some cases the data structures are one or two orders of magnitude larger than the data! For example, the DNA of a human genome, of about 3.3 billion bases, requires slightly less than 800 megabytes if we use only 2 bits per base (A, C, G, T), which fits in the main memory of any desktop PC. However, the suffix tree, a powerful data structure used to efficiently perform sequence analysis on the genome, requires at least 10 bytes per base, that is, more than 30 gigabytes.

The main techniques to cope with the growing size of data over recent years can be classified into three families:

Efficient secondary-memory algorithms. While accessing a random datum from disk is comparatively very slow, subsequent data are read much faster, only 100 times slower than from main memory. Therefore, algorithms that minimize the random accesses to the data can perform reasonably well on disk.

Type
Chapter
Information
Compact Data Structures
A Practical Approach
, pp. 1 - 13
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 9th edition.
Agarwal, R.,Khandelwal, A., and Stoica, I. (2015). Succinct: Enabling queries on compressed data. In Proc. 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 337–350.Google Scholar
Aho, A. V., Hopcroft, J. E., and Ullman, J. D. (1974). The Design and Analysis of Computer Algorithms. Addison-Wesley.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to Algorithms. MIT Press, 3rd edition.
Cover, T. and Thomas, J. (2006). Elements of Information Theory. Wiley, 2nd edition.
Ferragina, P. and Manzini, G. (2005). Indexing compressed texts. Journal of the ACM, 52(4), 552–581.Google Scholar
Fischer, J. and Heun, V. (2011). Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2), 465–492.Google Scholar
Gál, A. and Miltersen, P. B. (2007). The cell probe complexity of succinct data structures. Theoretical Computer Science, 379(3), 405–417.Google Scholar
Gog, S. (2011). Compressed Suffix Trees: Design, Construction, and Applications. Ph.D. thesis, Ulm University, Germany.
Gog, S. and Petri, M. (2014). Optimized succinct data structures for massive data. Software Practice and Experience, 44(11), 1287–1314.Google Scholar
Graham, R. L., Knuth, D. E., and Patashnik, O. (1994). Concrete Mathematics – A Foundation for Computer Science. Addison-Wesley, 2nd edition.
Grossi, R. and Ottaviano, G. (2013). Design of practical succinct data structures for large data collections. In Proc. 12th International Symposium on Experimental Algorithms (SEA), LNCS 7933, pages 5–17.Google Scholar
Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press.
Hennessy, J. L. and Patterson, D. A. (2012). Computer Architecture: A Quantitative Approach. Morgan Kauffman, 5th edition.
Jacobson, G. (1988). Succinct Data Structures. Ph.D. thesis, Carnegie Mellon University.
Kao, M.-Y., editor (2016). Encyclopedia of Algorithms. Springer, 2nd edition.
Knuth, D. E. (1998). The Art of Computer Programming, volume 3: Sorting and Searching. Addison- Wesley, 2nd edition.
Lei, X., Senior, A., Gruenstein, A., and Sorensen, J. (2013). Accurate and compact large vocabulary speech recognition on mobile devices. In Proc. 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 662–665.Google Scholar
Li, H. and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26(5), 589–595.Google Scholar
Mäkinen, V., Belazzougui, D., Cunial, F., and Tomescu, A. I. (2015). Genome-Scale Algorithm Design. Cambridge University Press.
Mehlhorn, K. (1984). Data Structures and Algorithms 1: Sorting and Searching. EATCS Monographs on Theoretical Computer Science. Springer-Verlag.
Munro, J. I. (1996). Tables. In Proc. 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), LNCS 1180, pages 37–42.Google Scholar
Muthukrishnan, S. (2005). Data Streams: Algorithms and Applications. Now Publishers.
Ohlebusch, E. (2013). Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag.
Raman, R. (2015). Encoding data structures. In Proc. 9th International Workshop on Algorithms and Computation (WALCOM), LNCS 8973, pages 1–7.Google Scholar
Rawlins, G. J. E. (1992). Compared to What? An Introduction to the Analysis of Algorithms. Computer Science Press.
Roosta, S. H. (1999). Parallel Processing and Parallel Algorithms: Theory and Computation. Springer.
Sedgewick, R. and Flajolet, P. (2013). An Introduction to the Analysis of Algorithms.Addison-Wesley- Longman, 2nd edition.
Sedgewick, R. and Wayne, K. (2011). Algorithms. Addison-Wesley, 4th edition.
Sorensen, J. and Allauzen, C. (2011). Unary data structures for language models. In Proc. 12th Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1425–1428.Google Scholar
Vitter, J. S. (2008). Algorithms and Data Structures for External Memory. Now Publishers.

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Introduction
  • Gonzalo Navarro, Universidad de Chile
  • Book: Compact Data Structures
  • Online publication: 05 September 2016
  • Chapter DOI: https://doi.org/10.1017/CBO9781316588284.002
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Introduction
  • Gonzalo Navarro, Universidad de Chile
  • Book: Compact Data Structures
  • Online publication: 05 September 2016
  • Chapter DOI: https://doi.org/10.1017/CBO9781316588284.002
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Introduction
  • Gonzalo Navarro, Universidad de Chile
  • Book: Compact Data Structures
  • Online publication: 05 September 2016
  • Chapter DOI: https://doi.org/10.1017/CBO9781316588284.002
Available formats
×