Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- 16 Maps, Mapping, Sequencing, and Superstrings
- 17 Strings and Evolutionary Trees
- 18 Three Short Topics
- 19 Models of Genome-Level Mutations
- Epilogue – where next?
- Bibliography
- Glossary
- Index
18 - Three Short Topics
from IV - Currents, Cousins, and Cameos
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- 16 Maps, Mapping, Sequencing, and Superstrings
- 17 Strings and Evolutionary Trees
- 18 Three Short Topics
- 19 Models of Genome-Level Mutations
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
Matching DNA to protein with frameshift errors
In Section 15.11.3, we discussed the canonical advice of translating any newly sequenced gene into a derived amino acid sequence to search the protein databases for similarities to the new sequence. This is in contrast to searching DNA databases with the original DNA string. There is, however, a technical problem with using derived amino acid sequences. If a single nucleotide is missing from the DNA transcript, then the reading frame of the succeeding DNA will be changed (see Figure 18.1). A similar problem occurs if a nucleotide is incorrectly inserted into the transcript. Until the correct reading frame is reestablished (through additional errors), most of the translated amino acids will be incorrect, invalidating most comparisons made to the derived amino acid sequence.
Insertion and deletion errors during DNA sequencing are fairly common, so frameshift errors can be serious in the subsequent analysis. Those errors are in addition to any substitution errors that leave the reading frame unchanged. Moreover, informative alignments often contain a relatively small number of exactly matching characters and larger regions of more poorly aligned substrings (see Section 11.7 on local alignment). Therefore, two substrings that would align well without a frameshift error but would align poorly with one can easily be mistaken for regions that align poorly due only to substitution errors. Therefore, without some additional technique, it is easy to miss frameshift errors and hard to correct them.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees, and SequencesComputer Science and Computational Biology, pp. 480 - 491Publisher: Cambridge University PressPrint publication year: 1997