Book contents
- Frontmatter
- Contents
- Miscellaneous Frontmatter
- Notation
- Preface
- Part I Preliminaries
- 1 Molecular biology and high-throughput sequencing
- 2 Algorithm design
- 3 Data structures
- 4 Graphs
- 5 Network flows
- Part II Fundamentals of Biological Sequence Analysis
- Part III Genome-Scale Index Structures
- Part IV Genome-Scale Algorithms
- Part V Applications
- References
- Index
2 - Algorithm design
from Part I - Preliminaries
Published online by Cambridge University Press: 05 May 2015
- Frontmatter
- Contents
- Miscellaneous Frontmatter
- Notation
- Preface
- Part I Preliminaries
- 1 Molecular biology and high-throughput sequencing
- 2 Algorithm design
- 3 Data structures
- 4 Graphs
- 5 Network flows
- Part II Fundamentals of Biological Sequence Analysis
- Part III Genome-Scale Index Structures
- Part IV Genome-Scale Algorithms
- Part V Applications
- References
- Index
Summary
Among the general algorithm design techniques, dynamic programming is the one most heavily used in this book. Since great parts of Chapters 4 and 6 are devoted to introducing this topic in a unified manner, we shall not introduce this technique here. However, a taste of this technique is already provided by the solution to Exercise 2.2 of this chapter, which asks for a simple one-dimensional instance.
We will now cover some basic primitives that are later implicitly assumed to be known.
Complexity analysis
The algorithms described in this book are typically analyzed for their worst-case running time and space complexity: complexity is expressed as a function of the input parameters on the worst possible input. For example, if the input is a string of length n from an alphabet of size σ, a linear time algorithm works in O(n) time, where O(·) is the familiar big-O notation which hides constants. This means that the running time is upper bounded by cn elementary operations, where c is some constant.
We consider the alphabet size not to be a constant, that is, we use expressions of the form O(nσ) and O(n log σ), which are not linear complexity bounds. For the space requirements, we frequently use notations such as n log σ(1 + o(1)) = n log σ + o(n log σ). Here o(·) denotes a function that grows asymptotically strictly slower than its argument. For example, O(n log σ/ log log n) can be simplified to o(n log σ). Algorithms have an input and an output: by working space we mean the extra space required by an algorithm in addition to its input and output. Most of our algorithms are for processing DNA, where σ = 4, and it would seem that one could omit such a small constant in this context.
- Type
- Chapter
- Information
- Genome-Scale Algorithm DesignBiological Sequence Analysis in the Era of High-Throughput Sequencing, pp. 10 - 19Publisher: Cambridge University PressPrint publication year: 2015
- 1
- Cited by