Database Similarity Searching

Jin Xiong

doi:10.1017/CBO9780511806087.005

4 - Database Similarity Searching

Published online by Cambridge University Press: 05 June 2012

Jin Xiong

Show author details

Jin Xiong: Affiliation:
Texas A & M University

Book contents

Get access

Summary

A main application of pairwise alignment is retrieving biological sequences in databases based on similarity. This process involves submission of a query sequence and performing a pairwise comparison of the query sequence with all individual sequences in a database. Thus, database similarity searching is pairwise alignment on a large scale. This type of searching is one of the most effective ways to assign putative functions to newly determined sequences. However, the dynamic programming method described in Chapter 3 is slow and impractical to use in most cases. Special search methods are needed to speed up the computational process of sequence comparison. The theory and applications of the database searching methods are discussed in this chapter.

UNIQUE REQUIREMENTS OF DATABASE SEARCHING

There are unique requirements for implementing algorithms for sequence database searching. The first criterion is sensitivity, which refers to the ability to find as many correct hits as possible. It is measured by the extent of inclusion of correctly identified sequence members of the same family. These correct hits are considered “true positives” in the database searching exercise. The second criterion is selectivity, also called specificity, which refers to the ability to exclude incorrect hits. These incorrect hits are unrelated sequences mistakenly identified in database searching and are considered “false positives.” The third criterion is speed, which is the time it takes to get results from database searches. Depending on the size of the database, speed sometimes can be a primary concern.

Type: Chapter
Information: Essential Bioinformatics , pp. 51 - 62

DOI: https://doi.org/10.1017/CBO9780511806087.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. 1994. Issues in searching molecular sequences databases. Nat. Genet. 6:119–29CrossRef Google Scholar

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389–402CrossRef Google Scholar PubMed

Chen, Z. 2003. Assessing sequence comparison methods with the average precision criterion. Bioinformatics 19:2456–60CrossRef Google Scholar PubMed

Karlin, S., and Altschul, S. F. 1993. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U S A 90:5873–7CrossRef Google Scholar PubMed

Mullan, L. J., and Williams, G. W. 2002. BLAST and go? Brief. Bioinform. 3:200–2CrossRef Google Scholar PubMed

Sansom, C. 2000. Database searching with DNA and protein sequences: An introduction. Brief. Bioinform. 1:22–32CrossRef Google Scholar

Spang, R., and Vingron, M. 1998. Statistics of large-scale sequence searching. Bioinformatics 14:279–84CrossRef Google Scholar PubMed

Book contents

4 - Database Similarity Searching

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive