Skip to main content
×
×
Home
  • Print publication year: 2016
  • Online publication date: June 2016

15 - From sequence reads to evolutionary inferences

from Part III - Next Generation Challenges and Questions
Summary

Introduction

The history of molecular systematics can be caricatured as one of ever-increasing depth of sequence data, analysed by ever more complex models. In this respect, sequence data from whole genomes are the ultimate source of molecular markers that can act as characters for phylogenetic or population genetic analysis. While complete genomes in the strictest sense are only available for very few species, and fragmentary genome assemblies that capture the entire genome, but in many pieces, are also fairly restricted in scope beyond the prokaryotes, this is changing rapidly. More-or-less shallow genomic data, for example from EST sequencing projects, high-throughput transcriptome sequencing or some other kind of reduced-representation sequencing (see review by Davey et al. 2011) are now becoming widespread and of increasing utility in systematics and other areas of evolutionary biology. Studies using these kinds of data to reconstruct relationships between species have become known as ‘phylogenomics’, although the original usage of the term referred to using phylogenetic approaches to infer gene function (Eisen 1998), and the other parts of the research programme proposed under this name (Eisen and Fraser 2003) have been subsumed into the broader study of comparative and evolutionary genomics. Moreover, the term ‘phylogenomics’ has, perhaps, become over-extended, as datasets that claim this title vary in size and can be as few as 11 markers (Horvath et al. 2008) or as little as 30 kb of sequence data (Wiegmann et al. 2011), and in eukaryotic organisms, the ‘genomes’ in question are very often organelle (mitochondrial or chloroplast) genome sequences. Sequence data from whole genomes have the potential to be a rich source of molecular phylogenetic markers for any systematic question, but there are two areas in which large-scale, highly multi-locus data appear most valuable – occupying the two extremes of the range of timescales over which inference about evolutionary history is made.

Genome-scale data promise the ability to resolve ancient divergences, and in particular, fairly rapid (at least in geological terms) ancient radiations that have been difficult to reliably reconstruct with more limited molecular datasets. In this context, phylogenomic data have been applied to a wide taxonomic range of phylogenetic questions. Early usage of whole-genome data was in prokaryote systematics (e.g. Daubin et al. 2002).

Recommend this book

Email your librarian or administrator to recommend adding this book to your organisation's collection.

Next Generation Systematics
  • Online ISBN: 9781139236355
  • Book DOI: https://doi.org/10.1017/CBO9781139236355
Please enter your name
Please enter a valid email address
Who would you like to send this to *
×
Aguinaldo, A. M., Turbeville, J. M., Linford, L. S., et al. (1997). Evidence for a clade of nematodes, arthropods and other moulting animals. Nature, 387, 489–93.
Altenhoff, A. M. and Dessimoz, C. (2012). Inferring orthology and paralogy. Methods in Molecular Biology, 855, 259–79.
Altshuler, D., Pollara, V. J., Cowles, C. R., et al. (2000). An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature, 407, 513–16.
Ané, C., Larget, B., Baum, D. A., Smith, S. D. and Rokas, A. (2007). Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution, 24, 412–26.
Anisimova, M. and Gascuel, O. (2006). Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology, 55, 539–52.
Assefa, S., Keane, T. M., Otto, T. D., Newbold, C. and Berriman, M. (2009). ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics, 25, 1968–9.
Baird, N. A., Etter, P. D., Atwood, T. S., et al. (2008). Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One, 3, e3376.
Bapteste, E., Susko, E., Leigh, J., et al. (2007). Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny. Molecular Biology and Evolution, 25, 83–91.
Barry, D. and Hartigan, J. A. (1987). Asynchronous distance between homologous DNA sequences. Biometrics, 43, 261–76.
Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics, 41, 379–406.
Blackshields, G., Wallace, I. M., Larkin, M. and Higgins, D. G. (2006). Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology, 6, 321–39.
Blair, C. and Murphy, R. W. (2010). Recent trends in molecular phylogenetic analysis: where to next?Journal of Heredity, 102, 130–8.
Blair, J. E., Ikeo, K., Gojobori, T. and Hedges, S. B. (2002). The evolutionary position of nematodes. BMC Evolutionary Biology, 2, 7.
Blanquart, S. and Lartillot, N. (2008). A site- and time-heterogeneous model of amino acid replacement. Molecular Biology and Evolution, 25, 842–58.
Boetzer, M. and Pirovano, W. (2012). Toward almost closed genomes with GapFiller. Genome Biology, 13, R56.
Bradnam, K. R., Fass, J. N., Alexandrov, A., et al. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10.
Breese, M. R. and Liu, Y. (2013). NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics, 29, 494–6.
Brown, J. M. and Lemmon, A. R. (2007). The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology, 56, 643–55.
Browning, S. R. and Browning, B. L. (2011). Haplotype phasing: existing methods and new developments. Nature Reviews Genetics, 12, 703–14.
Bybee, S. M., Bracken-Grissom, H., Haynes, B. D., et al. (2011). Targeted Amplicon Sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics. Genome Biology and Evolution, 3, 1312–23.
Capella-Gutierrez, S., Silla-Martinez, J. M. and Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 1972–73.
Carstens, B. C., Pelletier, T. A., Reid, N. M. and Satler, J. D. (2013). How to fail at species delimitation. Molecular Ecology, 22, 4369–83.
Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution, 17, 540–52.
Chain, P. S. G., Grafham, D. V., Fulton, R. S., et al. (2009). Genomics: genome project standards in a new era of sequencing. Science, 326, 236–7.
Choi, S. C. and Hey, J. (2011). Joint inference of population assignment and demographic history. Genetics, 189, 561–77.
Ciccarelli, F., Doerks, T., Mering, von, C., et al. (2006). Toward automatic reconstruction of a highly resolved Tree of Life. Science, 311, 1283–7.
Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011). How to apply de Bruijn graphs to genome assembly. Nature Biotechnology, 29, 987–91.
Cotton, J. A. and Page, R. D. M. (2005). Rates and patterns of gene duplication and loss in the human genome. Proceedings of the Royal Society B-Biological Sciences, 272, 277–83.
Cotton, J. A. and Wilkinson, M. (2009). Supertrees join the mainstream of phylogenetics. Trends in Ecology and Evolution, 24, 1–3.
Cox, C. J., Foster, P. G., Hirt, R. P., Harris, S. R. and Embley, T. M. (2008). The archaebacterial origin of eukaryotes. Proceedings of the National Academy of Sciences of the United States of America, 105, 20356–61.
Creevey, C. J., Muller, J., Doerks, T., et al. (2011). Identifying single copy orthologs in Metazoa. PLoS Computational Biology, 7, e1002269.
Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. and François, O. (2010). Approximate Bayesian Computation (ABC) in practice. Trends in Ecology and Evolution, 25, 410–18.
Dagan, T. and Martin, W. (2006). The tree of one percent. Genome Biology, 7, 118.
Dalquen, D. A. and Dessimoz, C. (2013). Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biology and Evolution, 5, 1800–6.
Danecek, P., Auton, A., Abecasis, G. et al. (2011). The variant call format and VCFtools. Bioinformatics, 27, 2156–8.
Daubin, V., Gouy, M. and Perrière, G. (2002). A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Research, 12, 1080–90.
Davey, J. W., Hohenlohe, P. A., Etter, P. D., et al. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12, 499–510.
de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. and Pollock, D. D. (2011). Repetitive elements may comprise over two-thirds of the human genome. PLoS Genetics, 7, e1002384.
de Queiroz, A., Donoghue, M. J. and Kim, J. (1995). Separate versus combined analysis of phylogenetic evidence. Annual Review of Ecology and Systematics, 26, 657–81.
Degnan, J. H. and Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genetics, 2, e68.
DeLuca, D. S., Levin, J. Z., Sivachenko, A., et al. (2012). RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics, 28, 1530–2.
Downing, T., Imamura, H., Decuypere, S., et al. (2011). Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance. Genome Research, 21, 2143–56.
Dunn, C. W., Hejnol, A., Matus, D. Q., et al. (2008). Broad phylogenomic sampling improves resolution of the animal Tree of Life. Nature, 452, 745–9.
Dunn, C. W., Howison, M. and Zapata, F. (2013). Agalma: an automated phylogenomics workflow. BMC Bioinformatics, 14, 330.
Edgecombe, G. D., Giribet, G., Dunn, C. W., et al. (2011). Higher-level Metazoan relationships: recent progress and remaining questions. Organisms Diversity and Evolution, 11, 151–72.
Edwards, S. V., Liu, L. and Pearl, D. K. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the United States of America, 104, 5936–41.
Eisen, J. A. (1998). Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research, 8, 163–7.
Eisen, J. A. and Fraser, C. M. (2003). Phylogenomics: intersection of evolution and genomics. Science, 300, 1706–7.
Erixon, P., Svennblad, B., Britton, T. and Oxelman, B. (2003). Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology, 52, 665–73.
Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3, 87–112.
Excoffier, L., Dupanloup, I., Huerta-Sãnchez, E., Sousa, V. C. and Foll, M. (2013). Robust demographic inference from genomic and SNP data. PLoS Genetics, 9, e1003905.
Fedrigo, O., Naylor, G. and Collins, T. (2005). Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Systematic Biology, 54, 493–500.
Flouri, T., Izquierdo-Carrasco, F., Darriba, D., et al. (2015). The phylogenetic likelihood library. Systematic Biology, 64, 356–62.
Fonseca, N. A., Rung, J., Brazma, A. and Marioni, J. C. (2012). Tools for mapping high-throughput sequencing data. Bioinformatics, 28, 3169–77.
Foster, P. G. (2004). Modeling compositional heterogeneity. Systematic Biology, 53, 485–95.
Galtier, N. and Gouy, M. (1998). Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution, 15, 871–9.
Gascuel, O. and Steel, M. (2006). Neighbor-joining revealed. Molecular Biology and Evolution, 23, 1997–2000.
Gatesy, J. and Baker, R. (2005). Hidden likelihood support in genomic data: can forty-five wrongs make a right?Systematic Biology, 54, 483–92.
Gayral, P., Melo-Ferreira, J., Glémin, S., et al. (2013). Reference-free population genomics from next-generation transcriptome data and the vertebrate–invertebrate gap. PLoS Genetics, 9, e1003457.
Gee, H. (2003). Evolution: ending incongruence. Nature 425, 782.
Gnirke, A., Melnikov, A., Maguire, J., et al. (2009). Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology, 27, 182–9.
Godden, G. T., Jordon-Thaden, I. E. and Chamala, S. (2012). Making next-generation sequencing work for you: approaches and practical considerations for marker development and phylogenetics. Plant Ecology and Diversity, 5, 427–50.
Goloboff, P. A., Farris, J. S. and Nixon, K. C. (2008). TNT, a free program for phylogenetic analysis. Cladistics, 24, 774–86.
Goodman, M., Czelusniak, J., Moore, G. W., Romero-Herrera, A. E. and Matsuda, G. (1979). Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms from globin sequences. Systematic Zoology, 28, 132–63.
Grant, J. R. and Katz, L. A. (2014). Building a phylogenomic pipeline for the eukaryotic tree of life – addressing deep phylogenies with genome-scale data. PLoS Currents Apr, 6.
Gremme, G., Steinbiss, S. and Kurtz, S. (2013). GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10, 645–56.
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. and Siepel, A. (2011). Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics, 43, 1031–4.
Guindon, S. and Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696–704.
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. and Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics, 5, e1000695.
Harris, K. and Nielsen, R. (2013). Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genetics, 9, e1003521.
Heled, J. and Drummond, A. J. (2010). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27, 570–80.
Hess, J. and Goldman, N. (2011). Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One, 6, e22783.
Hobolth, A., Christensen, O. F., Mailund, T. and Schierup, M. H. (2007). Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genetics, 3, e7.
Holland, B. R. (2004). Using consensus networks to visualize contradictory evidence for species phylogeny. Molecular Biology and Evolution, 21, 1459–61.
Holland, B. R., Jarvis, P. D. and Sumner, J. G. (2012). Low-parameter phylogenetic inference under the general Markov model. Systematic Biology, 62, 78–92.
Horvath, J. E., Weisrock, D. W., Embry, S. L., et al. (2008). Development and application of a phylogenomic toolkit: resolving the evolutionary history of Madagascar's lemurs. Genome Research, 18, 489–99.
Hunt, M., Newbold, C., Berriman, M. and Otto, T. D. (2014). A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15, R42.
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. and McVean, G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44, 226–32.
Jeffroy, O., Brinkmann, H., Delsuc, F. and Philippe, H. (2006). Phylogenomics: the beginning of incongruence?Trends in Genetics, 22, 225–31.
Jones, M. O., Koutsovoulos, G. D. and Blaxter, M. L. (2011). iPhy: an integrated phylogenetic workbench for supermatrix analyses. BMC Bioinformatics, 12, 30.
Kao, R. R., Haydon, D. T., Lycett, S. J. and Murcia, P. R. (2014). Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends in Microbiology, 22, 282–91.
Koren, S., Harhay, G. P., Smith, T. P., et al. (2013). Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biology, 14, R101.
Kubatko, L. S., Carstens, B. C. and Knowles, L. L. (2009). STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics, 25, 971–3.
Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L. and Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–72.
Landan, G. and Graur, D. (2007). Heads or tails: a simple reliability check for multiple sequence alignments. Molecular Biology and Evolution, 24, 1380–3.
Lanfear, R., Calcott, B., Ho, S. Y. W. and Guindon, S. (2012). PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution, 29, 1695–701.
Lartillot, N. and Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution, 21, 1095–109.
Latreille, P., Norton, S., Goldman, B. S., et al. (2007). Optical mapping as a routine tool for bacterial genome sequence finishing. BMC Genomics, 8, 321.
Lee, E. K., Cibrian-Jaramillo, A., Kolokotronis, S.-O., et al. (2011). A functional phylogenomic view of the seed plants. PLoS Genetics, 7, e1002411.
Lemmon, A. R., Brown, J. M., Stanger-Hall, K. and Lemmon, E. M. (2009). The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Systematic Biology, 58, 130–45.
Lemmon, A. R., Emme, S. A. and Lemmon, E. M. (2012). Anchored hybrid enrichment for massively high-throughput phylogenomics. Systematic Biology, 61, 727–44.
Lemmon, E. M. and Lemmon, A. R. (2013). High-throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 44, 99–121.
Li, H. and Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature, 475, 493–6.
Li, H., Handsaker, B., Wysoker, A. et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–9.
Li, H. and Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11, 473–83.
Li, L., Stoeckert, C. J. and Roos, D. S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research, 13, 2178–89.
Li, R., Zhu, H., Ruan, J., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20, 265–72.
Liu, K., Raghavan, S., Nelesen, S., Linder, C. R. and Warnow, T. (2009). Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science, 324, 1561–4.
Liu, L., Yu, L., Kubatko, L., Pearl, D. K. and Edwards, S. V. (2009). Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution, 53, 320–8.
Löytynoja, A. and Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America, 102, 10557–62.
Löytynoja, A. and Milinkovitch, M. C. (2001). SOAP: cleaning multiple alignments from unstable blocks. Bioinformatics, 17, 573–4.
Maddison, W. and Knowles, L. (2006). Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21–30.
Mallatt, J. M., Garey, J. R. and Shultz, J. W. (2004). Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rRNA gene sequences to classify the arthropods and their kin. Molecular Phylogenetics and Evolution, 31, 178–91.
Mamanova, L., Coffey, A. J., Scott, C. E., et al. (2010). Target-enrichment strategies for next-generation sequencing. Nature Methods, 7, 111–18.
Manske, M., Miotto, O., Campino, S., et al. (2012). Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature, 487, 375–9.
McCormack, J. E., Hird, S. M., Zellmer, A. J., Carstens, B. C. and Brumfield, R. T. (2013). Applications of next-generation sequencing to phylogeography and phylogenetics. Molecular Phylogenetics and Evolution, 66, 526–38.
McVean, G. A. T. and Cardin, N. J. (2005). Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B-Biological Sciences, 360, 1387–93.
Medvedev, P., Stanciu, M. and Brudno, M. (2009). Computational methods for discovering structural variation with next-generation sequencing. Nature Methods, 6, S13–S20.
Miller, J. R., Koren, S. and Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–27.
Morrison, D. A. and Ellis, J. T. (1997). Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Molecular Biology and Evolution, 14, 428–41.
Mullikin, J. C. and Ning, Z. (2003). The Phusion Assembler. Genome Research, 13, 81–90.
Nguyen-Dumont, T., Pope, B. J., Hammet, F., Southey, M. C. and Park, D. J. (2013). A high-plex PCR approach for massively parallel sequencing. BioTechniques, 55, 69–74.
Nichols, R. (2001). Gene trees and species trees are not the same. Trends in Ecology and Evolution, 16, 358–64.
Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. and Clark, A. G. (2007). Recent and ongoing selection in the human genome. Nature Reviews Genetics, 8, 857–68.
Nielsen, R., Paul, J. S., Albrechtsen, A. and Song, Y. S. (2011). Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12, 443–51.
Nosenko, T., Schreiber, F., Adamska, M., et al. (2013). Deep metazoan phylogeny: when different genes tell different stories. Molecular Phylogenetics and Evolution, 67, 223–33.
Nylander, J. A. A., Ronquist, F., Huelsenbeck, J. P. and Nieves-Aldrey, J.-L. (2004). Bayesian phylogenetic analysis of combined data. Systematic Biology, 53, 47–67.
Ogden, T. H. and Rosenberg, M. S. (2006). Multiple sequence alignment accuracy and phylogenetic inference. Systematic Biology, 55, 314–28.
Page, R. D. and Charleston, M. A. (1997). From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution, 7, 231–40.
Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53, 571–81.
Parkhill, J. (2002). The importance of complete genome sequences. Trends in Microbiology, 10, 219–20; author reply 220.
Penny, D., McComish, B. J., Charleston, M. A. and Hendy, M. D. (2014). Mathematical elegance with biochemical realism: the covarion model of molecular evolution. Journal of Molecular Evolution, 53, 711–23.
Perkel, J. (2008). SNP genotyping: six technologies that keyed a revolution. Nature Methods, 5, 447–53.
Philip, G. K., Creevey, C. J. and McInerney, J. O. (2005). The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Molecular Biology and Evolution, 22, 1175–84.
Philippe, H., Delsuc, F., Brinkmann, H. and Lartillot, N. (2005a). Phylogenomics. Annual Review of Ecology, Evolution, and Systematics, 36, 541–62.
Philippe, H., Lartillot, N. and Brinkmann, H. (2005b). Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Molecular Biology and Evolution, 22, 1246–53.
Phillips, M. J. (2004). Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution, 21, 1455–8.
Pisani, D. (2004). Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. Systematic Biology, 53, 978–89.
Pisani, D., Cotton, J. A. and McInerney, J. O. (2007). Supertrees disentangle the chimerical origin of eukaryotic genomes. Molecular Biology and Evolution, 24, 1752–60.
Pons, J., Barraclough, T., Gómez-Zurita, J., et al. (2006). Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology, 55, 595–609.
Pool, J. E., Hellmann, I., Jensen, J. D. and Nielsen, R. (2010). Population genetic inference from genomic sequence variation. Genome Research, 20, 291–300.
Posada, D. and Buckley, T. (2004). Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53, 793–808.
Qiu, Y.-L., Li, L., Wang, B., et al. (2006). The deepest divergences in land plants inferred from phylogenomic evidence. Proceedings of the National Academy of Sciences of the United States of America, 103, 15511–16.
Quinlan, A. R. and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–2.
Rannala, B. and Yang, Z. (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164, 1645–56.
Rannala, B. and Yang, Z. (2008). Phylogenetic inference using whole genomes. Annual Review of Genomics and Human Genetics, 9, 217–31.
Rhaesa, A. S., Bartolomaeus, T., Lemburg, C., Ehlers, U. and Garey, J. R. (1998). The position of the Arthropoda in the phylogenetic system. Journal of Morphology, 238, 263–85.
Rodríguez-Ezpeleta, N., Brinkmann, H., Roure, B., et al. (2007). Detecting and overcoming systematic errors in genome-scale phylogenies. Systematic Biology, 56, 389–99.
Rokas, A., Williams, B. L., King, N. and Carroll, S. B. (2003). Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature, 425, 798–804.
Rosenberg, M. S., ed. (2011). Sequence Alignment: Methods, Models, Concepts, and Strategies. Oakland, CA, University of California Press.
Rosenberg, N. A. and Nordborg, M. (2002). Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Reviews Genetics, 3, 380–90.
Roth, A. C., Gonnet, G. H. and Dessimoz, C. (2009). Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics, 10, 220.
Roure, B., Baurain, D. and Philippe, H. (2012). Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Molecular Biology and Evolution, 30, 197–214.
Salichos, L. and Rokas, A. (2014). Inferring ancient divergences requires genes with strong phylogenetic signals. Nature, 497, 327–31.
Sankoff, D., Morel, C. and Cedergren, R. J. (1973). Evolution of 5S RNA and the non-randomness of base replacement. Nature New Biology, 245, 232–4.
Scheinfeldt, L. B. and Tishkoff, S. A. (2013). Recent human adaptation: genomic approaches, interpretation and insights. Nature Reviews Genetics, 14, 692–702.
Schiffels, S. and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nature Genetics, 46, 919–25.
Schneeberger, K., Ossowski, S., Ott, F., et al. (2011). Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proceedings of the National Academy of Sciences of the United States of America, 108, 10249–54.
Scholtz, G. (2002). The Articulata hypothesis – or what is a segment?Organisms Diversity and Evolution, 2, 197–215.
Shapiro, B. (2005). Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Molecular Biology and Evolution, 23, 7–9.
Simpson, J. T. and Durbin, R. (2010). Efficient construction of an assembly string graph using the FM-Index. Bioinformatics, 26, i367–73.
Simpson, J. T. and Durbin, R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22, 549–56.
Simpson, J. T., Wong, K., Jackman, S. D., et al. (2009). ABySS: a parallel assembler for short read sequence data. Genome Research, 19, 1117–23.
Smith, B. T., Harvey, M. G., Faircloth, B. C., Glenn, T. C. and Brumfield, R. T. (2013). Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. Systematic Biology, 63, 83–95.
Sousa, V. and Hey, J. (2013). Understanding the origin of species with genome-scale data: modelling gene flow. Nature Reviews Genetics, 14, 404–14.
Spang, A., Saw, J. H., Jørgensen, S. L., et al. (2015). Complex Archaea that bridge the gap between prokaryotes and eukaryotes. Nature, 521, 173–9.
Stamatakis, A. (2014). RAxML Version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312–13.
Stamatakis, A., Hoover, P. and Rougemont, J. (2008). A rapid bootstrap algorithm for the RAxML web servers. Systematic Biology, 57, 758–71.
Steel, M. (2005). Should phylogenetic models be trying to “fit an elephant”?Trends in Genetics, 21, 307–9.
Struck, T. H., Paul, C., Hill, N., et al. (2011). Phylogenomic analyses unravel annelid evolution. Nature, 471, 95–98.
Suchard, M. A. and Rambaut, A. (2009). Many-core algorithms for statistical phylogenetics. Bioinformatics, 25, 1370–76.
Swain, M. T., Tsai, I. J., Assefa, S. A., et al. (2012). A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nature Protocols, 7, 1260–84.
Swofford, D. L., Olsen, G. J., Waddell, P. J. and Hillis, D. M. (1996). Phylogenetic inference. In Molecular Systematics, ed. Hillis, D. M., Moritz, C. and Mable, B. K.. Sunderland, MA, Sinauer Associates; pp. 407–515.
Szöllősi, G. J., Tannier, E., Daubin, V. and Boussau, B. (2015). The inference of gene trees with species trees. Systematic Biology, 64, e42–e62.
Taylor, D. J. (2004). An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data. Molecular Biology and Evolution, 21, 1534–7.
Telford, M. J., Bourlat, S. J., Economou, A., Papillon, D. and Rota-Stabelli, O. (2008). The evolution of the Ecdysozoa. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences, 363, 1529–37.
Tewhey, R., Warner, J. B., Nakano, M., et al. (2009). Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nature Biotechnology, 27, 1025–31.
The 1000 Genomes Project Consortium (2013). An integrated map of genetic variation from 1,092 human genomes. Nature, 490, 56–65.
Thompson, J. D., Linard, B., Lecompte, O. and Poch, O. (2011). A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One, 6, e18093.
Thompson, J. F. and Milos, P. M. (2011). The properties and applications of single-molecule DNA sequencing. Genome Biology, 12, 217.
Timme, R. E., Bachvaroff, T. R. and Delwiche, C. F. (2012). Broad phylogenomic sampling and the sister lineage of land plants. PLoS One, 7, e29696.
Treangen, T. J. and Salzberg, S. L. (2011). Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics, 13, 36–46.
Trivedi, U. H. (2014). Quality control of next-generation sequencing data without a reference. Frontiers in Genetics, 5, 111.
Turner, E. H., Ng, S. B., Nickerson, D. A. and Shendure, J. (2009). Methods for genomic partitioning. Annual Review of Genomics and Human Genetics, 10, 263–84.
Vilella, A. J., Severin, J., Ureta-Vidal, A., et al. (2008). EnsemblCompara genetrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Research, 19, 327–35.
Vitti, J. J., Grossman, S. R. and Sabeti, P. C. (2013). Detecting natural selection in genomic data. Annual Review of Genetics, 47, 97–120.
Watson, M. (2014). Quality assessment and control of high-throughput sequencing data. Frontiers in Genetics, 5, 235.
Westesson, O., Barquist, L. and Holmes, I. (2012). HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction. Bioinformatics, 28, 1170–1.
Wheeler, W. C. and Gladstein, D. S. (1994). MALIGN: a multiple sequence alignment program. Journal of Heredity, 85, 417–18.
Whelan, N. V., Kocot, K. M., Moroz, L. L. and Halanych, K. M. (2015). Error, signal, and the placement of Ctenophora sister to all other animals. Proceedings of the National Academy of Sciences of the United States of America, 112, 5773–8.
Whelan, S. (2008). Spatial and temporal heterogeneity in nucleotide sequence evolution. Molecular Biology and Evolution, 25, 1683–94.
Whitelaw, C. A., Barbazuk, W. B., Pertea, G., et al. (2003). Enrichment of gene-coding sequences in maize by genome filtration. Science, 302, 2118–20.
Wiegmann, B. M., Trautwein, M. D., Winkler, I. S., et al. (2011). Episodic radiations in the fly tree of life. Proceedings of the National Academy of Sciences of the United States of America, 108, 5690–5.
Wilkinson, M. (2006). Identifying stable reference taxa for phylogenetic nomenclature. Zoologica Scripta, 35, 109–12.
Williams, T. A., Foster, P. G., Cox, C. J. and Embley, T. M. (2014). An archaeal origin of eukaryotes supports only two primary domains of life. Nature, 504, 231–6.
Williams, T. A., Foster, P. G., Nye, T. M. W., Cox, C. J. and Embley, T. M. (2012). A congruent phylogenomic signal places eukaryotes within the Archaea. Proceedings of the Royal Society B – Biological Sciences, 279, 4870–9.
Wong, K. M., Suchard, M. A. and Huelsenbeck, J. P. (2008). Alignment uncertainty and genomic analysis. Science, 319, 473–6.
Wood, D. E. and Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15, R46.
Wu, M., Chatterji, S. and Eisen, J. A. (2012). Accounting for alignment uncertainty in phylogenomics. PLoS One, 7, e30288.
Wu, M. and Eisen, J. A. (2008). A simple, fast, and accurate method of phylogenomic inference. Genome Biology, 9, R151.
Yalcin, B., Adams, D. J., Flint, J. and Keane, T. M. (2012). Next-generation sequencing of experimental mouse strains. Mammalian Genome, 23, 490–8.
Yang, Z. (1996a). Maximum-likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution, 42, 587–96.
Yang, Z. (1996b). Among-site rate variation and its impact on phylogenetic analyses. Trends in Ecology and Evolution, 11, 367–72.
Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using De Bruijn graphs. Genome Research, 18, 821–9.
Zhou, X. and Rokas, A. (2014). Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Molecular Ecology, 23, 1679–700.