Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
10 - The Importance of (Sub)sequence Comparison in Molecular Biology
from III - Inexact Matching, Sequence Alignment, Dynamic Programming
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
Sequence comparison, particularly when combined with the systematic collection, curration, and search of databases containing biomolecular sequences, has become essential in modern molecular biology. Commenting on the (then) near-completion of the effort to sequence the entire yeast genome (now finished), Stephen Oliver says
In a short time it will be hard to realize how we managed without the sequence data. Biology will never be the same again. [478]
One fact explains the importance of molecular sequence data and sequence comparison in biology.
The first fact of biological sequence analysis
The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity usually implies significant functional or structural similarity.
Evolution reuses, builds on, duplicates, and modifies “successful” structures (proteins, exons, DNA regulatory sequences, morphological features, enzymatic pathways, etc.). Life is based on a repertoire of structured and interrelated molecular building blocks that are shared and passed around. The same and related molecular structures and mechanisms show up repeatedly in the genome of a single species and across a very wide spectrum of divergent species. “Duplication with modification” [127, 128, 129, 130] is the central paradigm of protein evolution, wherein new proteins and/or new biological functions are fashioned from earlier ones. Doolittle emphasizes this point as follows:
The vast majority of extant proteins are the result of a continuous series of genetic duplications and subsequent modifications.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees and SequencesComputer Science and Computational Biology, pp. 212 - 214Publisher: Cambridge University PressPrint publication year: 1997
- 3
- Cited by