Skip to main content Accessibility help
×
Publisher:
Cambridge University Press
Online publication date:
May 2010
Print publication year:
2008
Online ISBN:
9780511754838

Book description

The recent explosive growth of biological data has lead to a rapid increase in the number of molecular biology databases. Held in many different locations and often using varying interfaces and non-standard data formats, integrating and comparing data from these multiple databases can be difficult and time-consuming. This book provides an overview of the key tools currently available for large-scale comparisons of gene sequences and annotations, focusing on the databases and tools from the University of California, Santa Cruz (UCSC), Ensembl, and the National Centre for Biotechnology Information (NCBI). Written specifically for biology and bioinformatics students and researchers, it aims to give an appreciation of the methods by which the browsers and their databases are constructed, enabling readers to determine which tool is the most appropriate for their requirements. Each chapter contains a summary and exercises to aid understanding and promote effective use of these important tools.

Reviews

'The book would suit a bioinformatician wishing to gain an introduction into genome database querying and interaction.'

Source: Microbiology Today

'… provides a step-by-step account of how most commonly-used databases are compiled and updated, their applications and practical examples of how to use them. It is suitable for graduates and advanced undergraduates in bioinformatics or biology, or any researcher intent on exploiting the capabilities of databases as research tools more fully. … the great strength of this book is its focus on basic concepts, with an emphasis on how to obtain information, enabling the reader to find new things out for themselves.'

Source: Journal of Biological Education

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Save to Kindle
  • Save to Dropbox
  • Save to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
×

Contents

References
Alfarano, C., Andrade, C. E., et al. (2005). “The Biomolecular Interaction Network Database and related tools 2005 update.” Nucleic Acids Res 33(Database issue): D418–24.
Ashurst, J. L., Chen, C. K., et al. (2005). “The Vertebrate Genome Annotation (Vega) database.” Nucleic Acids Res 33(Database issue): D459–65.
Asthana, S., Roytberg, M., et al. (2007). “Analysis of Sequence Conservation at Nucleotide Resolution.” PLoS Comp Bio 3(12): e254.
Barrett, J. C., Fry, B., et al. (2005). “Haploview: analysis and visualization of LD and haplotype maps.” Bioinformatics 21(2): 263–5.
Baxevanis, A. D. (2003). “Using genomic databases for sequence-based biological discovery.” Mol Med 9(9–12): 185–92.
Birkland, A. and Yona, G. (2006). “BIOZON: a hub of heterogeneous biological data.” Nucleic Acids Res 34(Database issue): D235–42.
Birney, E. (2003). “Ensembl: a genome infrastructure.” Cold Spring Harb Symp Quant Biol 68: 213–15.
Birney, E., Andrews, T. D., et al. (2004). “An overview of Ensembl.” Genome Res 14(5): 925–8.
Birney, E., Stamatoyannopoulos, J. A., et al. (2007). “Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.” Nature 447(7146): 799–816.
Bishop, A. C., Xu, J., et al. (2002). “Identification of the tRNA-dihydrouridine synthase family.” J Biol Chem 277(28): 25090–5.
Blake, J. A., Richardson, J. E., et al. (2003). “MGD: the Mouse Genome Database.” Nucleic Acids Res 31(1): 193–5.
Blanchette, M., Kent, W. J., et al. (2004). “Aligning multiple genomic sequences with the threaded blockset aligner.” Genome Res 14(4): 708–15.
Blankenberg, D., Taylor, J., et al. (2007). “A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly.” Genome Res 17(6): 960–4.
Bray, N. and Pachter, L. (2004). “MAVID: constrained ancestral alignment of multiple sequences.” Genome Res 14(4): 693–9.
Brent, M. R. (2007). “How does eukaryotic gene prediction work?Nat Biotechnol 25(8): 883–5.
Brudno, M., Do, C. B., et al. (2003a). “LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.” Genome Res 13(4): 721–31.
Brudno, M., Malde, S., et al. (2003b). “Glocal alignment: finding rearrangements during alignment.” Bioinformatics 19 Suppl 1: i54–62.
Burge, C. and Karlin, S. (1997). “Prediction of complete gene structures in human genomic DNA.” J Mol Biol 268(1): 78–94.
Caspi, R., Foerster, H, et al. (2006). “MetaCyc: a multiorganism database of metabolic pathways and enzymes.” Nucleic Acids Res 34(Database issue): D511–16.
Choi, K., Ma, Y., et al. (2005). “PLATCOM: a Platform for Computational Comparative Genomics.” Bioinformatics 21(10): 2514–16.
Christie, K. R., Weng, S., et al. (2004). “Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms.” Nucleic Acids Res 32(Database issue): D311–14.
Cooper, G. M., Stone, E. A., et al. (2005). “Distribution and intensity of constraint in mammalian genomic sequence.” Genome Res 15(7): 901–13.
Curwen, V., Eyras, E., et al. (2004). “The Ensembl automatic gene annotation system.” Genome Res 14(5): 942–50.
Dewey, C. N. and Pachter, L. (2006). “Evolution at the nucleotide level: the problem of multiple whole-genome alignment.” Hum Mol Genet 15 (Spec No 1): R51–6.
Dowell, R. D., Jokerst, R. M., et al. (2001). “The distributed annotation system.” BMC Bioinformatics 2: 7.
Drmanac, R., Labat, I., et al. (1989). “Sequencing of megabase plus DNA by hybridization: theory of the method.” Genomics 4(2): 114–28.
DuBois, P. (2005). MySQL, Sams Developer's Library.
Durbin, R., Eddy, S., et al. (1998). Biological Sequence Analysis, Cambridge University Press.
Eilbeck, K., Lewis, S. E., et al. (2005). “The Sequence Ontology: a tool for the unification of genome annotations.” Genome Biol 6(5): R44.
Eppig, J. T., Blake, J. A., et al. (2007). “The mouse genome database (MGD): new features facilitating a model system.” Nucleic Acids Res 35(Database issue): D630–7.
Flicek, P., Aken, B. L., et al. (2008). “Ensembl 2008.” Nucleic Acids Res 36(Database issue): D707–14.
Frazer, K. A., Ballinger, D. G., et al. (2007). “A second generation human haplotype map of over 3.1 million SNPs.” Nature 449(7164): 851–61.
Frazer, K. A., Pachter, L., et al. (2004). “VISTA: computational tools for comparative genomics.” Nucleic Acids Res 32(Web Server issue): W273–9.
Furey, T. S. (2006). “Comparison of human (and other) genome browsers.” Hum Genomics 2(4): 266–70.
Furey, T. S., Diekhans, M., et al. (2004). “Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.” Genome Res 14(10B): 2034–40.
Gerstein, M. B., Bruce, C., et al. (2007). “What is a gene, post-ENCODE? History and updated definition.” Genome Res 17(6): 669–81.
Giardine, B., Riemer, C., et al. (2005). “Galaxy: a platform for interactive large-scale genome analysis.” Genome Res 15(10): 1451–5.
Gilbert, D. G. (2007). “DroSpeGe: rapid access database for new Drosophila species genomes.” Nucleic Acids Res 35(Database issue): D480–5.
Green, P. (2007). “2x genomes: does depth matter?Genome Res 17(11): 1547–9.
Green, R. E., Krause, J., et al. (2006). “Analysis of one million base pairs of Neanderthal DNA.” Nature 444(7117): 330–6.
Green, R. E., Lewis, B. P., et al. (2003). “Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genes.” Bioinformatics 19(Suppl 1): i118–21.
Griffiths-Jones, S., Moxon, S., et al. (2005). “Rfam: annotating non-coding RNAs in complete genomes.” Nucleic Acids Res 33(Database issue): D121–4.
Gross, S. S. and Brent, M. R. (2006). “Using multiple alignments to improve gene prediction.” J Comput Biol 13(2): 379–93.
Harrow, J., Denoeud, F., et al. (2006). “GENCODE: producing a reference annotation for ENCODE.” Genome Biol 7(Suppl 1): S4 1–9.
Holzner, S. (1999). Perl Core Language, Coriolis.
Hoon, S., Ratnapu, K. K., et al. (2003). “Biopipe: a flexible framework for protocol-based bioinformatics analysis.” Genome Res 13(8): 1904–15.
Hsu, F., Pringle, T. H., et al. (2005). “The UCSC Proteome Browser.” Nucleic Acids Res 33(Database issue): D454–8.
Hubbard, T. J., Aken, B. L., et al. (2007). “Ensembl 2007.” Nucleic Acids Res 35(Database issue): D610–17.
Hull, D., Wolstencroft, K., et al. (2006). “Taverna: a tool for building and running workflows of services.” Nucleic Acids Res 34(Web Server issue): W729–32.
Hüttenhofer, A., Schattner, P., et al. (2005). “Non-coding RNAs: hope or hype?Trends Genet 21(5): 289–97.
Iafrate, A. J., Feuk, L., et al. (2004). “Detection of large-scale variation in the human genome.” Nat Genet 36(9): 949–51.
Jaiswal, P., Ni, J., et al. (2006). “Gramene: a bird's eye view of cereal genomes.” Nucleic Acids Res 34(Database issue): D717–23.
Kapustin, Yu., Souvorov, A., et al. (2004). “Splign – a hybrid approach to spliced alignments.” RECOMB 2004 – Currents in Comp Mol Bio: 741.
Karolchik, D., Hinrichs, A. S., et al. (2004). “The UCSC Table Browser data retrieval tool.” Nucleic Acids Res 32(Database issue): D493–6.
Karolchik, D., Kuhn, R. M., et al. (2008). “The UCSC Genome Browser Database: 2008 Update.” Nucleic Acids Res 36(Database issue): D773–9.
Kartalov, E. P. and Quake, S. R. (2004). “Microfluidic device reads up to four consecutive base pairs in DNA sequencing-by-synthesis.” Nucleic Acids Res 32(9): 2873–9.
Kasprzyk, A., Keefe, D., et al. (2004). “EnsMart: a generic system for fast and flexible access to biological data.” Genome Res 14(1): 160–9.
Kent, W. J. (2002). “BLAT – the BLAST-like alignment tool.” Genome Res 12(4): 656–64.
Kent, W. J., Baertsch, R., et al. (2003). “Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.” Proc Natl Acad Sci U S A 100(20): 11484–9.
Kent, W. J., Hsu, F., et al. (2005). “Exploring relationships and mining data with the UCSC Gene Sorter.” Genome Res 15(5): 737–41.
Korbel, J. O., Urban, A. E., et al. (2007). “Paired-end mapping reveals extensive structural variation in the human genome.” Science 318(5849): 420–6.
Kuhn, R. M., Karolchik, D., et al. (2007). “The UCSC Genome Browser database: update 2007.” Nucleic Acids Res 35(Database issue): D668–73.
Leamon, J. H. and Rothberg, J. M. (2007). “Cramming more sequencing reactions onto microreactor chips.” Chem Rev 107(8): 3367–76.
Lee, T. J., Pouliot, Y., et al. (2006). “BioWarehouse: a bioinformatics database warehouse toolkit.” BMC Bioinformatics 7: 170.
Lestrade, L. and Weber, M. J. (2006). “snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs.” Nucleic Acids Res 34(Database issue): D158–62.
Lev-Maor, G., Sorek, R., et al. (2003). “The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons.” Science 300(5623): 1288–91.
Levanon, E. Y., Eisenberg, E., et al. (2004). “Systematic identification of abundant A-to-I editing sites in the human transcriptome.” Nat Biotechnol 22(8): 1001–5.
Lewis, S. E., Searle, S. M., et al. (2002). “Apollo: a sequence annotation editor.” Genome Biol 3(12): RESEARCH0082.
Ma, B., Tromp, J., et al. (2002). “PatternHunter: faster and more sensitive homology search.” Bioinformatics 18(3): 440–5.
Margulies, E. H., Cooper, G. M., et al. (2007). “Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.” Genome Res 17(6): 760–74.
Markowitz, V. M., Korzeniewski, F., et al. (2006a). “The integrated microbial genomes (IMG) system.” Nucleic Acids Res 34(Database issue): D344–8.
Markowitz, V. M., Ivanova, N., et al. (2006b). “An experimental metagenome data management and analysis system.” Bioinformatics 22(14): e359–67.
Markowitz, V. M., Ivanova, N. N., et al. (2008). “IMG/M: a data management and analysis system for metagenomes.” Nucleic Acids Res 36(Database issue): D534–8.
Mount, D. W. (2004). Bioinformatics: Sequence and Genome Analysis, 2nd Edition, Cold Spring Harbor Laboratory Press.
Mungall, C. J. and Emmert, D. B. (2007). “A Chado case study: an ontology-based modular schema for representing genome-associated biological information.” Bioinformatics 23(13): i337–46.
Noonan, J. P., Hofreiter, M., et al. (2005). “Genomic sequencing of Pleistocene cave bears.” Science 309(5734): 597–9.
Oinn, T., Addis, M., et al. (2004). “Taverna: a tool for the composition and enactment of bioinformatics workflows.” Bioinformatics 20(17): 3045–54.
Olson, M. (2007). “Enrichment of super-sized resequencing targets from the human genome.” Nat Methods 4(11): 891–2.
Pedersen, J. S., Bejerano, G., et al. (2006). “Identification and classification of conserved RNA secondary structures in the human genome.” PLoS Comput Biol 2(4): e33.
Pond, S. L., Frost, S. D., et al. (2005). “HyPhy: hypothesis testing using phylogenies.” Bioinformatics 21(5): 676–9.
Pontius, J. U., Mullikin, J. C., et al. (2007). “Initial sequence and comparative analysis of the cat genome.” Genome Res 17(11): 1675–89.
Potter, S. C., Clarke, L., et al. (2004). “The Ensembl analysis pipeline.” Genome Res 14(5): 934–41.
Prakash, A. and Tompa, M. (2007). “Measuring the accuracy of genome-size multiple alignments.” Genome Biol 8(6): R124.
Primrose, S. B. and Twyman, R. M. (2006). Principles of Gene Manipulation and Genomics, 7th Edition, Blackwell Publishing.
Pruitt, K. D., Tatusova, T., et al. (2007). “NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.” Nucleic Acids Res 35(Database issue): D61–5.
Rampp, M., Soddemann, T., et al. (2006). “The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis.” Nucleic Acids Res 34(Web Server issue): W15–19.
Reek, K. A. (1998). Pointers on C, Addison-Wesley.
Reich, M., Liefeld, T., et al. (2006). “GenePattern 2.0.” Nat Genet 38(5): 500–1.
Rice, P., Longden, I., et al. (2000). “EMBOSS: the European Molecular Biology Open Software Suite.” Trends Genet 16(6): 276–7.
Rogers, Y. H. and Venter, J. C. (2005). “Genomics: massively parallel sequencing.” Nature 437(7057): 326–7.
Schattner, P., Barberan-Soler, S., et al. (2006). “A computational screen for mammalian pseudouridylation guide H/ACA RNAs.” RNA 12(1): 15–25.
Schneider, K. L., Pollard, K. S., et al. (2006). “The UCSC Archaeal Genome Browser.” Nucleic Acids Res 34(Database issue): D407–10.
Schwartz, S., Kent, W. J., et al. (2003). “Human-mouse alignments with BLASTZ.” Genome Res 13(1): 103–7.
Schwarz, E. M., Antoshechkin, I., et al. (2006). “WormBase: better software, richer content.” Nucleic Acids Res 34(Database issue): D475–8.
Sebat, J., Lakshmi, B., et al. (2004). “Large-scale copy number polymorphism in the human genome.” Science 305(5683): 525–8.
Shah, S. P., Huang, Y., et al. (2005). “Atlas – a data warehouse for integrative bioinformatics.” BMC Bioinformatics 6: 34.
Siepel, A., Bejerano, G., et al. (2005). “Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.” Genome Res 15(8): 1034–50.
Slater, G. S. and Birney, E. (2005). “Automated generation of heuristics for biological sequence comparison.” BMC Bioinformatics 6: 31.
Southern, E. M., Maskos, U., et al. (1992). “Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models.” Genomics 13(4): 1008–17.
Stabenau, A., McVicker, G., et al. (2004). “The Ensembl core software libraries.” Genome Res 14(5): 929–33.
Stajich, J. E., Block, D., et al. (2002). “The Bioperl toolkit: Perl modules for the life sciences.” Genome Res 12(10): 1611–18.
Stajich, J. E. and Lapp, H. (2006). “Open source tools and toolkits for bioinformatics: significance, and where are we?Brief Bioinform 7(3): 287–96.
Stein, L. D. (2003). “Integrating biological databases.” Nat Rev Genet 4(5): 337–45.
Stein, L. D., Mungall, C., et al. (2002). “The generic genome browser: a building block for a model organism system database.” Genome Res 12(10): 1599–610.
Stevens, R. D., Tipney, H. J., et al. (2004). “Exploring Williams-Beuren syndrome using myGrid.” Bioinformatics 20(Suppl 1): i303–10.
Subramaniam, S. (1998). “The Biology Workbench – a seamless database and analysis environment for the biologist.” Proteins 32(1): 1–2.
Sundquist, A., Ronaghi, M., et al. (2007). “Whole-genome sequencing and assembly with high-throughput, short-read technologies.” PLoS ONE 2(5): e484.
Thomas, D. J., Rosenbloom, K. R., et al. (2007). “The ENCODE Project at UC Santa Cruz.” Nucleic Acids Res 35(Database issue): D663–7.
Thornton, J. W., Need, E., et al. (2003). “Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling.” Science 301(5640): 1714–17.
Tisdall, J. D. (2001). Beginning Perl for Bioinformatics, O'Reilly.
Tisdall, J. D. (2003). Mastering Perl for Bioinformatics, O'Reilly.
Wheeler, D. L., Barrett, T., et al. (2005). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 33(Database issue): D39–45.
Wheeler, D. L., Barrett, T., et al. (2006). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 34(Database issue): D173–80.
Wheeler, D. L., Barrett, T., et al. (2007). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 35(Database issue): D5–12.
Wheeler, D. L., Barrett, T., et al. (2008). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 36(Database issue): D13–21.
Will, C. L. and Luhrmann, R. (2005). “Splicing of a rare class of introns by the U12-dependent spliceosome.” Biol Chem 386(8): 713–24.
Zdobnov, E. M., Lopez, R., et al. (2002). “The EBI SRS server – recent developments.” Bioinformatics 18(2): 368–73.
Zimmerman, O., Tomlinson, M., et al. (2005). Perspectives on Web Services, Springer.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.