Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
13 - Extending the Core Problems
from III - Inexact Matching, Sequence Alignment, Dynamic Programming
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
In this chapter we look in detail at alignment problems in the more complex contexts typical of string problems that currently arise in computational molecular biology. These more complex problems require techniques that extend (rather than refine) the core alignment methods.
Parametric sequence alignment
Introduction
When using sequence alignment methods to study DNA or amino acid sequences, there is often considerable disagreement about how to weight matches, mismatches, insertions and deletions (indels), and gaps. The most commonly used alignment software packages require the user to specify fixed values for those parameters, and it is widely observed that the biological significance of the resulting alignment can be greatly affected by the choice of parameter settings. The following relates to alignments of proteins from the globin family and is representative of frequently seen comments in the biological literature:
…one must be able to vary the gap and gap size penalties independently and in a query dependent fashion in order to obtain the maximal sensitivity of the search.
[81]A similar comment appears in [432]:
Sequence alignment is sensitive to the choices of gap penalty and the form of the relatedness matrix, and it is often desirable to vary these …
Finally, from [446],
One of the most prominent problems is the choice of parametric values, especially gap penalties. When very similar sequences are compared, the choice is not critical; but when the conservation is low, the resulting alignment is strongly affected.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees, and SequencesComputer Science and Computational Biology, pp. 312 - 331Publisher: Cambridge University PressPrint publication year: 1997