Genomes, Browsers and Databases: Data-Mining Tools for Integrated Genomic Databases

Peter Schattner

doi:10.1017/CBO9780511754838

References

Alfarano, C., Andrade, C. E., et al. (2005). “The Biomolecular Interaction Network Database and related tools 2005 update.” Nucleic Acids Res 33(Database issue): D418–24.

Ashurst, J. L., Chen, C. K., et al. (2005). “The Vertebrate Genome Annotation (Vega) database.” Nucleic Acids Res 33(Database issue): D459–65.

Asthana, S., Roytberg, M., et al. (2007). “Analysis of Sequence Conservation at Nucleotide Resolution.” PLoS Comp Bio 3(12): e254.

Barrett, J. C., Fry, B., et al. (2005). “Haploview: analysis and visualization of LD and haplotype maps.” Bioinformatics 21(2): 263–5.

Baxevanis, A. D. (2003). “Using genomic databases for sequence-based biological discovery.” Mol Med 9(9–12): 185–92.

Birkland, A. and Yona, G. (2006). “BIOZON: a hub of heterogeneous biological data.” Nucleic Acids Res 34(Database issue): D235–42.

Birney, E. (2003). “Ensembl: a genome infrastructure.” Cold Spring Harb Symp Quant Biol 68: 213–15.

Birney, E., Andrews, T. D., et al. (2004). “An overview of Ensembl.” Genome Res 14(5): 925–8.

Birney, E., Stamatoyannopoulos, J. A., et al. (2007). “Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.” Nature 447(7146): 799–816.

Bishop, A. C., Xu, J., et al. (2002). “Identification of the tRNA-dihydrouridine synthase family.” J Biol Chem 277(28): 25090–5.

Blake, J. A., Richardson, J. E., et al. (2003). “MGD: the Mouse Genome Database.” Nucleic Acids Res 31(1): 193–5.

Blanchette, M., Kent, W. J., et al. (2004). “Aligning multiple genomic sequences with the threaded blockset aligner.” Genome Res 14(4): 708–15.

Blankenberg, D., Taylor, J., et al. (2007). “A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly.” Genome Res 17(6): 960–4.

Bray, N. and Pachter, L. (2004). “MAVID: constrained ancestral alignment of multiple sequences.” Genome Res 14(4): 693–9.

Brent, M. R. (2007). “How does eukaryotic gene prediction work?” Nat Biotechnol 25(8): 883–5.

Brudno, M., Do, C. B., et al. (2003a). “LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.” Genome Res 13(4): 721–31.

Brudno, M., Malde, S., et al. (2003b). “Glocal alignment: finding rearrangements during alignment.” Bioinformatics 19 Suppl 1: i54–62.

Burge, C. and Karlin, S. (1997). “Prediction of complete gene structures in human genomic DNA.” J Mol Biol 268(1): 78–94.

Caspi, R., Foerster, H, et al. (2006). “MetaCyc: a multiorganism database of metabolic pathways and enzymes.” Nucleic Acids Res 34(Database issue): D511–16.

Choi, K., Ma, Y., et al. (2005). “PLATCOM: a Platform for Computational Comparative Genomics.” Bioinformatics 21(10): 2514–16.

Christie, K. R., Weng, S., et al. (2004). “Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms.” Nucleic Acids Res 32(Database issue): D311–14.

Cooper, G. M., Stone, E. A., et al. (2005). “Distribution and intensity of constraint in mammalian genomic sequence.” Genome Res 15(7): 901–13.

Curwen, V., Eyras, E., et al. (2004). “The Ensembl automatic gene annotation system.” Genome Res 14(5): 942–50.

Dewey, C. N. and Pachter, L. (2006). “Evolution at the nucleotide level: the problem of multiple whole-genome alignment.” Hum Mol Genet 15 (Spec No 1): R51–6.

Dowell, R. D., Jokerst, R. M., et al. (2001). “The distributed annotation system.” BMC Bioinformatics 2: 7.

Drmanac, R., Labat, I., et al. (1989). “Sequencing of megabase plus DNA by hybridization: theory of the method.” Genomics 4(2): 114–28.

DuBois, P. (2005). MySQL, Sams Developer's Library.

Durbin, R., Eddy, S., et al. (1998). Biological Sequence Analysis, Cambridge University Press.

Eilbeck, K., Lewis, S. E., et al. (2005). “The Sequence Ontology: a tool for the unification of genome annotations.” Genome Biol 6(5): R44.

Eppig, J. T., Blake, J. A., et al. (2007). “The mouse genome database (MGD): new features facilitating a model system.” Nucleic Acids Res 35(Database issue): D630–7.

Flicek, P., Aken, B. L., et al. (2008). “Ensembl 2008.” Nucleic Acids Res 36(Database issue): D707–14.

Frazer, K. A., Ballinger, D. G., et al. (2007). “A second generation human haplotype map of over 3.1 million SNPs.” Nature 449(7164): 851–61.

Frazer, K. A., Pachter, L., et al. (2004). “VISTA: computational tools for comparative genomics.” Nucleic Acids Res 32(Web Server issue): W273–9.

Furey, T. S. (2006). “Comparison of human (and other) genome browsers.” Hum Genomics 2(4): 266–70.

Furey, T. S., Diekhans, M., et al. (2004). “Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.” Genome Res 14(10B): 2034–40.

Gerstein, M. B., Bruce, C., et al. (2007). “What is a gene, post-ENCODE? History and updated definition.” Genome Res 17(6): 669–81.

Giardine, B., Riemer, C., et al. (2005). “Galaxy: a platform for interactive large-scale genome analysis.” Genome Res 15(10): 1451–5.

Gilbert, D. G. (2007). “DroSpeGe: rapid access database for new Drosophila species genomes.” Nucleic Acids Res 35(Database issue): D480–5.

Green, P. (2007). “2x genomes: does depth matter?” Genome Res 17(11): 1547–9.

Green, R. E., Krause, J., et al. (2006). “Analysis of one million base pairs of Neanderthal DNA.” Nature 444(7117): 330–6.

Green, R. E., Lewis, B. P., et al. (2003). “Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genes.” Bioinformatics 19(Suppl 1): i118–21.

Griffiths-Jones, S., Moxon, S., et al. (2005). “Rfam: annotating non-coding RNAs in complete genomes.” Nucleic Acids Res 33(Database issue): D121–4.

Gross, S. S. and Brent, M. R. (2006). “Using multiple alignments to improve gene prediction.” J Comput Biol 13(2): 379–93.

Harrow, J., Denoeud, F., et al. (2006). “GENCODE: producing a reference annotation for ENCODE.” Genome Biol 7(Suppl 1): S4 1–9.

Holzner, S. (1999). Perl Core Language, Coriolis.

Hoon, S., Ratnapu, K. K., et al. (2003). “Biopipe: a flexible framework for protocol-based bioinformatics analysis.” Genome Res 13(8): 1904–15.

Hsu, F., Pringle, T. H., et al. (2005). “The UCSC Proteome Browser.” Nucleic Acids Res 33(Database issue): D454–8.

Hubbard, T. J., Aken, B. L., et al. (2007). “Ensembl 2007.” Nucleic Acids Res 35(Database issue): D610–17.

Hull, D., Wolstencroft, K., et al. (2006). “Taverna: a tool for building and running workflows of services.” Nucleic Acids Res 34(Web Server issue): W729–32.

Hüttenhofer, A., Schattner, P., et al. (2005). “Non-coding RNAs: hope or hype?” Trends Genet 21(5): 289–97.

Iafrate, A. J., Feuk, L., et al. (2004). “Detection of large-scale variation in the human genome.” Nat Genet 36(9): 949–51.

Jaiswal, P., Ni, J., et al. (2006). “Gramene: a bird's eye view of cereal genomes.” Nucleic Acids Res 34(Database issue): D717–23.

Kapustin, Yu., Souvorov, A., et al. (2004). “Splign – a hybrid approach to spliced alignments.” RECOMB 2004 – Currents in Comp Mol Bio: 741.

Karolchik, D., Hinrichs, A. S., et al. (2004). “The UCSC Table Browser data retrieval tool.” Nucleic Acids Res 32(Database issue): D493–6.

Karolchik, D., Kuhn, R. M., et al. (2008). “The UCSC Genome Browser Database: 2008 Update.” Nucleic Acids Res 36(Database issue): D773–9.

Kartalov, E. P. and Quake, S. R. (2004). “Microfluidic device reads up to four consecutive base pairs in DNA sequencing-by-synthesis.” Nucleic Acids Res 32(9): 2873–9.

Kasprzyk, A., Keefe, D., et al. (2004). “EnsMart: a generic system for fast and flexible access to biological data.” Genome Res 14(1): 160–9.

Kent, W. J. (2002). “BLAT – the BLAST-like alignment tool.” Genome Res 12(4): 656–64.

Kent, W. J., Baertsch, R., et al. (2003). “Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.” Proc Natl Acad Sci U S A 100(20): 11484–9.

Kent, W. J., Hsu, F., et al. (2005). “Exploring relationships and mining data with the UCSC Gene Sorter.” Genome Res 15(5): 737–41.

Korbel, J. O., Urban, A. E., et al. (2007). “Paired-end mapping reveals extensive structural variation in the human genome.” Science 318(5849): 420–6.

Kuhn, R. M., Karolchik, D., et al. (2007). “The UCSC Genome Browser database: update 2007.” Nucleic Acids Res 35(Database issue): D668–73.

Leamon, J. H. and Rothberg, J. M. (2007). “Cramming more sequencing reactions onto microreactor chips.” Chem Rev 107(8): 3367–76.

Lee, T. J., Pouliot, Y., et al. (2006). “BioWarehouse: a bioinformatics database warehouse toolkit.” BMC Bioinformatics 7: 170.

Lestrade, L. and Weber, M. J. (2006). “snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs.” Nucleic Acids Res 34(Database issue): D158–62.

Lev-Maor, G., Sorek, R., et al. (2003). “The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons.” Science 300(5623): 1288–91.

Levanon, E. Y., Eisenberg, E., et al. (2004). “Systematic identification of abundant A-to-I editing sites in the human transcriptome.” Nat Biotechnol 22(8): 1001–5.

Lewis, S. E., Searle, S. M., et al. (2002). “Apollo: a sequence annotation editor.” Genome Biol 3(12): RESEARCH0082.

Ma, B., Tromp, J., et al. (2002). “PatternHunter: faster and more sensitive homology search.” Bioinformatics 18(3): 440–5.

Margulies, E. H., Cooper, G. M., et al. (2007). “Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.” Genome Res 17(6): 760–74.

Markowitz, V. M., Korzeniewski, F., et al. (2006a). “The integrated microbial genomes (IMG) system.” Nucleic Acids Res 34(Database issue): D344–8.

Markowitz, V. M., Ivanova, N., et al. (2006b). “An experimental metagenome data management and analysis system.” Bioinformatics 22(14): e359–67.

Markowitz, V. M., Ivanova, N. N., et al. (2008). “IMG/M: a data management and analysis system for metagenomes.” Nucleic Acids Res 36(Database issue): D534–8.

Mount, D. W. (2004). Bioinformatics: Sequence and Genome Analysis, 2nd Edition, Cold Spring Harbor Laboratory Press.

Mungall, C. J. and Emmert, D. B. (2007). “A Chado case study: an ontology-based modular schema for representing genome-associated biological information.” Bioinformatics 23(13): i337–46.

Noonan, J. P., Hofreiter, M., et al. (2005). “Genomic sequencing of Pleistocene cave bears.” Science 309(5734): 597–9.

Oinn, T., Addis, M., et al. (2004). “Taverna: a tool for the composition and enactment of bioinformatics workflows.” Bioinformatics 20(17): 3045–54.

Olson, M. (2007). “Enrichment of super-sized resequencing targets from the human genome.” Nat Methods 4(11): 891–2.

Pedersen, J. S., Bejerano, G., et al. (2006). “Identification and classification of conserved RNA secondary structures in the human genome.” PLoS Comput Biol 2(4): e33.

Pond, S. L., Frost, S. D., et al. (2005). “HyPhy: hypothesis testing using phylogenies.” Bioinformatics 21(5): 676–9.

Pontius, J. U., Mullikin, J. C., et al. (2007). “Initial sequence and comparative analysis of the cat genome.” Genome Res 17(11): 1675–89.

Potter, S. C., Clarke, L., et al. (2004). “The Ensembl analysis pipeline.” Genome Res 14(5): 934–41.

Prakash, A. and Tompa, M. (2007). “Measuring the accuracy of genome-size multiple alignments.” Genome Biol 8(6): R124.

Primrose, S. B. and Twyman, R. M. (2006). Principles of Gene Manipulation and Genomics, 7th Edition, Blackwell Publishing.

Pruitt, K. D., Tatusova, T., et al. (2007). “NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.” Nucleic Acids Res 35(Database issue): D61–5.

Rampp, M., Soddemann, T., et al. (2006). “The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis.” Nucleic Acids Res 34(Web Server issue): W15–19.

Reek, K. A. (1998). Pointers on C, Addison-Wesley.

Reich, M., Liefeld, T., et al. (2006). “GenePattern 2.0.” Nat Genet 38(5): 500–1.

Rice, P., Longden, I., et al. (2000). “EMBOSS: the European Molecular Biology Open Software Suite.” Trends Genet 16(6): 276–7.

Rogers, Y. H. and Venter, J. C. (2005). “Genomics: massively parallel sequencing.” Nature 437(7057): 326–7.

Schattner, P., Barberan-Soler, S., et al. (2006). “A computational screen for mammalian pseudouridylation guide H/ACA RNAs.” RNA 12(1): 15–25.

Schneider, K. L., Pollard, K. S., et al. (2006). “The UCSC Archaeal Genome Browser.” Nucleic Acids Res 34(Database issue): D407–10.

Schwartz, S., Kent, W. J., et al. (2003). “Human-mouse alignments with BLASTZ.” Genome Res 13(1): 103–7.

Schwarz, E. M., Antoshechkin, I., et al. (2006). “WormBase: better software, richer content.” Nucleic Acids Res 34(Database issue): D475–8.

Sebat, J., Lakshmi, B., et al. (2004). “Large-scale copy number polymorphism in the human genome.” Science 305(5683): 525–8.

Shah, S. P., Huang, Y., et al. (2005). “Atlas – a data warehouse for integrative bioinformatics.” BMC Bioinformatics 6: 34.

Siepel, A., Bejerano, G., et al. (2005). “Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.” Genome Res 15(8): 1034–50.

Slater, G. S. and Birney, E. (2005). “Automated generation of heuristics for biological sequence comparison.” BMC Bioinformatics 6: 31.

Southern, E. M., Maskos, U., et al. (1992). “Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models.” Genomics 13(4): 1008–17.

Stabenau, A., McVicker, G., et al. (2004). “The Ensembl core software libraries.” Genome Res 14(5): 929–33.

Stajich, J. E., Block, D., et al. (2002). “The Bioperl toolkit: Perl modules for the life sciences.” Genome Res 12(10): 1611–18.

Stajich, J. E. and Lapp, H. (2006). “Open source tools and toolkits for bioinformatics: significance, and where are we?” Brief Bioinform 7(3): 287–96.

Stein, L. D. (2003). “Integrating biological databases.” Nat Rev Genet 4(5): 337–45.

Stein, L. D., Mungall, C., et al. (2002). “The generic genome browser: a building block for a model organism system database.” Genome Res 12(10): 1599–610.

Stevens, R. D., Tipney, H. J., et al. (2004). “Exploring Williams-Beuren syndrome using myGrid.” Bioinformatics 20(Suppl 1): i303–10.

Subramaniam, S. (1998). “The Biology Workbench – a seamless database and analysis environment for the biologist.” Proteins 32(1): 1–2.

Sundquist, A., Ronaghi, M., et al. (2007). “Whole-genome sequencing and assembly with high-throughput, short-read technologies.” PLoS ONE 2(5): e484.

Thomas, D. J., Rosenbloom, K. R., et al. (2007). “The ENCODE Project at UC Santa Cruz.” Nucleic Acids Res 35(Database issue): D663–7.

Thornton, J. W., Need, E., et al. (2003). “Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling.” Science 301(5640): 1714–17.

Tisdall, J. D. (2001). Beginning Perl for Bioinformatics, O'Reilly.

Tisdall, J. D. (2003). Mastering Perl for Bioinformatics, O'Reilly.

Wheeler, D. L., Barrett, T., et al. (2005). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 33(Database issue): D39–45.

Wheeler, D. L., Barrett, T., et al. (2006). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 34(Database issue): D173–80.

Wheeler, D. L., Barrett, T., et al. (2007). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 35(Database issue): D5–12.

Wheeler, D. L., Barrett, T., et al. (2008). “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Res 36(Database issue): D13–21.

Will, C. L. and Luhrmann, R. (2005). “Splicing of a rare class of introns by the U12-dependent spliceosome.” Biol Chem 386(8): 713–24.

Zdobnov, E. M., Lopez, R., et al. (2002). “The EBI SRS server – recent developments.” Bioinformatics 18(2): 368–73.

Zimmerman, O., Tomlinson, M., et al. (2005). Perspectives on Web Services, Springer.

Genomes, Browsers and Databases

Data-Mining Tools for Integrated Genomic Databases

Book description

Reviews

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-vi

Contents
pp vii-viii

Preface
pp ix-xii

1 - The Molecular Biology Data Explosion
pp 1-20

2 - Introduction to Genome Browsing with the UCSC Genome Browser
pp 21-37

3 - Browsing with Ensembl, MapViewer, and Other Genome Browsers
pp 38-60

4 - Interactive Genome-Database Batch Querying
pp 61-75

5 - Interactive Batch Post-Processing with Galaxy
pp 76-95

6 - Introduction to Programmed Querying
pp 96-101

7 - Using the Ensembl API
pp 102-130

8 - Programmed Querying with Ensembl, Continued
pp 131-147

9 - Introduction to the UCSC API
pp 148-177

10 - More Advanced Applications Using the UCSC API
pp 178-214

11 - Customized Genome Databases
pp 215-237

12 - Genomes, Browsers, Databases – The Future
pp 238-252

Appendix 1 - Coordinate System Conventions
pp 253-258

Appendix 2 - Genome Data Formats
pp 259-271

Appendix 3 - UCSC Table Formats
pp 272-275

Appendix 4 - Genomic Sequence Alignments
pp 276-281

Appendix 5 - Program Code README File
pp 282-283

Appendix 6 - Selected General References for Genome Databases and Browsers
pp 284-287

Appendix 7 - Online Documentation and Useful Web Sites for Genome Databases and Browsers
pp 288-292

Appendix 8 - Glossary of Biological and Computer Terms Used in the Text
pp 293-306

References
pp 307-312

Index
pp 313-328

Metrics

Full text views

Book summary page views