Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
9 - Multiple Sequence Alignment I
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
Summary
Once a family of homologous proteins has been identified, it is often useful to arrange their sequences in a multiple alignment such as the one in Figure 9.1.
A multiple alignment is useful for constructing a so-called consensus sequence, which – while probably differing from every individual sequence in the family – is nonetheless a better representative of the family than any of its actual members. Multiple alignments can also form the basis of more abstract statistical models of the protein family called profiles.
By examining which elements of the consensus are present in most or all family members and which exhibit a greater degree of variability, we can also find clues to the protein's function. Highly conserved regions are likely to have been conserved because they form active sites crucial to function, while more variable regions are more likely to have merely structural roles.
We have already seen in Chapter 3 that the number of ways in which a mere two sequences of only moderate length can be aligned is comparable to current estimates of the number of atoms in the observable universe. The addition of more sequences only increases the number of possibilities. We need both a criterion for evaluating multiple alignments and a computational strategy that will allow us to eliminate large sets of alignments at one stroke.
To describe our evaluation criterion, we will rely on the notion of projection of a multiple alignment.
- Type
- Chapter
- Information
- Genomic PerlFrom Bioinformatics Basics to Working Code, pp. 127 - 140Publisher: Cambridge University PressPrint publication year: 2002