Book contents
- Frontmatter
- Contents
- Preface
- SECTION I INTRODUCTION AND BIOLOGICAL DATABASES
- SECTION II SEQUENCE ALIGNMENT
- 3 Pairwise Sequence Alignment
- 4 Database Similarity Searching
- 5 Multiple Sequence Alignment
- 6 Profiles and Hidden Markov Models
- 7 Protein Motifs and Domain Prediction
- SECTION III GENE AND PROMOTER PREDICTION
- SECTION IV MOLECULAR PHYLOGENETICS
- SECTION V STRUCTURAL BIOINFORMATICS
- SECTION V GENOMICS AND PROTEOMICS
- APPENDIX
- Index
- Plate section
- References
4 - Database Similarity Searching
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- SECTION I INTRODUCTION AND BIOLOGICAL DATABASES
- SECTION II SEQUENCE ALIGNMENT
- 3 Pairwise Sequence Alignment
- 4 Database Similarity Searching
- 5 Multiple Sequence Alignment
- 6 Profiles and Hidden Markov Models
- 7 Protein Motifs and Domain Prediction
- SECTION III GENE AND PROMOTER PREDICTION
- SECTION IV MOLECULAR PHYLOGENETICS
- SECTION V STRUCTURAL BIOINFORMATICS
- SECTION V GENOMICS AND PROTEOMICS
- APPENDIX
- Index
- Plate section
- References
Summary
A main application of pairwise alignment is retrieving biological sequences in databases based on similarity. This process involves submission of a query sequence and performing a pairwise comparison of the query sequence with all individual sequences in a database. Thus, database similarity searching is pairwise alignment on a large scale. This type of searching is one of the most effective ways to assign putative functions to newly determined sequences. However, the dynamic programming method described in Chapter 3 is slow and impractical to use in most cases. Special search methods are needed to speed up the computational process of sequence comparison. The theory and applications of the database searching methods are discussed in this chapter.
UNIQUE REQUIREMENTS OF DATABASE SEARCHING
There are unique requirements for implementing algorithms for sequence database searching. The first criterion is sensitivity, which refers to the ability to find as many correct hits as possible. It is measured by the extent of inclusion of correctly identified sequence members of the same family. These correct hits are considered “true positives” in the database searching exercise. The second criterion is selectivity, also called specificity, which refers to the ability to exclude incorrect hits. These incorrect hits are unrelated sequences mistakenly identified in database searching and are considered “false positives.” The third criterion is speed, which is the time it takes to get results from database searches. Depending on the size of the database, speed sometimes can be a primary concern.
- Type
- Chapter
- Information
- Essential Bioinformatics , pp. 51 - 62Publisher: Cambridge University PressPrint publication year: 2006