Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
11 - Core String Edits, Alignments, and Dynamic Programming
from III - Inexact Matching, Sequence Alignment, Dynamic Programming
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- 10 The Importance of (Sub)sequence Comparison in Molecular Biology
- 11 Core String Edits, Alignments, and Dynamic Programming
- 12 Refining Core String Edits and Alignments
- 13 Extending the Core Problems
- 14 Multiple String Comparison – The Holy Grail
- 15 Sequence Databases and Their Uses – The Mother Lode
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
Introduction
In this chapter we consider the inexact matching and alignment problems that form the core of the field of inexact matching and others that illustrate the most general techniques. Some of those problems and techniques will be further refined and extended in the next chapters. We start with a detailed examination of the most classic inexact matching problem solved by dynamic programming, the edit distance problem. The motivation for inexact matching (and, more generally, sequence comparison) in molecular biology will be a recurring theme explored throughout the rest of the book. We will discuss many specific examples of how string comparison and inexact matching are used in current molecular biology. However, to begin, we concentrate on the purely formal and technical aspects of defining and computing inexact matching.
The edit distance between two strings
Frequently, one wants a measure of the difference or distance between two strings (for example, in evolutionary, structural, or functional studies of biological strings; in textual database retrieval; or in spelling correction methods). There are several ways to formalize the notion of distance between strings. One common, and simple, formalization [389, 299], called edit distance, focuses on transforming (or editing) one string into the other by a series of edit operations on individual characters. The permitted edit operations are insertion of a character into the first string, the deletion of a character from the first string, or the substitution (or replacement) of a character in the first string with a character in the second string.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees, and SequencesComputer Science and Computational Biology, pp. 215 - 253Publisher: Cambridge University PressPrint publication year: 1997
- 3
- Cited by