Book contents
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
8 - Bounds for Optimal Sequence Alignment
from Part II - Studies on the four themes
Published online by Cambridge University Press: 04 August 2010
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
Summary
One of the most frequently used techniques in determining the similarity between biological sequences is optimal sequence alignment. In the standard instance of the sequence alignment problem, we are given two sequences (usually DNA or protein sequences) that have evolved from a common ancestor via a series of mutations, insertions and deletions. The goal is to find the best alignment between the two sequences. The definition of “best” here depends on the choice of scoring scheme, and there is often disagreement about the correct choice. In parametric sequence alignment, this problem is circumvented by instead computing the optimal alignment as a function of variable scores. In this chapter, we address one such scheme, in which all matches are equally rewarded, all mismatches are equally penalized and all spaces are equally penalized. An efficient parametric sequence alignment algorithm is described in Chapter 7. Here we will address the structure of the set of different alignments, and in particular the number of different alignments of two given sequences which can be optimal. For a detailed treatment on the subject of sequence alignment, we refer the reader to [Gusfield, 1997].
Alignments and optimality
We first review some notation from Section 2.2. In this chapter, all alignments will be global alignments between two sequences σ1 and σ2 of the same length, denoted by n.
- Type
- Chapter
- Information
- Algebraic Statistics for Computational Biology , pp. 206 - 214Publisher: Cambridge University PressPrint publication year: 2005