Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-jbqgn Total loading time: 0 Render date: 2024-06-16T19:43:56.293Z Has data issue: false hasContentIssue false

1 - Perspective: Challenges in assembling the ‘next generation’ Tree of Life

from Part I - Next Generation Phylogenetics

Published online by Cambridge University Press:  05 June 2016

Michael J. Sanderson
Affiliation:
University of Arizona, Tucson, USA
Peter D. Olson
Affiliation:
Natural History Museum, London
Joseph Hughes
Affiliation:
University of Glasgow
James A. Cotton
Affiliation:
Wellcome Trust Sanger Institute, Cambridge
Get access

Summary

Make no little plan

Attributed to Daniel Burnham, Chicago architect

Introduction

Phylogenetic trees are getting large. Trees based on single loci have been constructed for > 100 000 taxa (Price et al. 2010), and trees based on a handful of loci for > 10 000 taxa (Goloboff et al. 2009; Smith et al. 2011a). Basic counting arguments show that the number of loci needed to reconstruct a tree accurately scales up with the number of leaves, N, in the tree (Mossel and Steel 2005, p. 400). Whether this scaling occurs at a conjectured rate of log(N), or is worse than that, the need for genome-scale datasets is likely to increase. Fortunately, the pace at which new sequence data are accumulating is extraordinary, and its revolutionary impact on systematics has been noted many times (e.g. Goldman and Yang 2008). What is perhaps more noteworthy is that taxon sampling has been keeping pace with advances in sequencing technology, so that the size of phylogenetic datasets has been steadily increasing in both dimensions. Figure 1.1 shows the expanding wave front of phylogenetic dataset size, a kind of ‘Moore's Law’ for phylogenomics. This pattern undoubtedly has its limits. Goldman and Yang (2008) documented the exponential growth in number of sequences in databases, but cautioned that molecular phylogenetic studies are accumulating at a rate that is less than exponential. This is probably due to a combination of the mean number of sequences per study increasing over time (Fig 1.1), and the inevitable increasing difficulty of obtaining samples of rare taxa. Given the ‘hollow curve’ of distribution, the fact that most species are both geographically restricted and locally uncommon (McGill 2010), it is doubtful that sampling across taxa will be able to keep up with sampling across individual genomes. Nonetheless, today ~ 19% of described biodiversity has at least one sequence in GenBank (355 000 species out of 1.9 million, as of March 2016).

There are many reasons to add genome-scale data to phylogenetic inference in local problems in the Tree of Life, or to solidify its deep backbone with a small number of exemplars, but this paper focuses on the task of building large, species rich, high-resolution phylogenies.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ané, C., Larget, B., Baum, D. A., Smith, S. D. and Rokas, A. (2007). Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution, 24, 412–26.CrossRefGoogle ScholarPubMed
Bansal, M., Burleigh, J. G., Eulenstein, O. and Wehe, A. (2007). Heuristics for the gene-duplication problem, An O(N) speed-up for the local search. In RECOMB 2007, ed. Speed, T. and Huang, H.. Heidelberg, Springer; pp. 238–252.Google Scholar
Barker, M. S., Baute, G. J. and Liu, S.-L. (2012). Duplications and turnover in plant genomes. In Plant Genome Diversity, ed. Wendel, J. F.. Vienna, Springer; pp. 155–69.Google Scholar
Bininda-Emonds, O. R. P., Brady, S. G., Kim, J. and Sanderson, M. J. (2001). Scaling of accuracy in extremely large phylogenetic trees. Pacific Symposium on Biocomputing, 6, 547–58.Google Scholar
Bremer, B., Bremer, K., Chase, M. W., et al. (2009). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants, APG III. Botanical Journal of the Linnean Society, 161, 105–21.Google Scholar
Burleigh, J. G., Bansal, M. S., Eulenstein, O., Hartmann, S., Wehe, A. and Vision, T. J. (2011). Genome-scale phylogenetics, inferring the plant tree of life from 18,896 gene trees. Systematic Biology, 60, 117–25.CrossRefGoogle ScholarPubMed
Burleigh, J. G., Bansal, M. S., Wehe, A. and Eulenstein, O. (2009). Locating large-scale gene duplication events through reconciled trees, implications for identifying ancient polyploidy events in plants. Journal of Computational Biology, 16, 1071–83.CrossRefGoogle ScholarPubMed
Chase, M. W., Soltis, D. E., Olmstead, R. G., et al. (1993). Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden, 80, 528–80.CrossRefGoogle Scholar
Cranston, K. A., Hurwitz, B., Sanderson, M. J., Ware, D., Wing, R. A. and Stein, L. (2010). Phylogenomic analysis from deep BAC-end sequence libraries in rice. Systematic Botany, 35, 512–23.CrossRefGoogle Scholar
Davidson, R., Vachaspati, P., Mirarab, S. and Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics, 16(Suppl. 10):S1.CrossRefGoogle ScholarPubMed
Demuth, J. P. and Hahn, M. W. (2009). The life and death of gene families. BioEssays, 31, 29–39.CrossRefGoogle ScholarPubMed
Duarte, J. M., Wall, P. K., Edger, P. P., et al. (2010). Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evolutionary Biology, 10, 61.CrossRefGoogle Scholar
Ebersberger, I., Galgoczy, P., Taudien, S., Taenzer, S., Platzer, M. and Von Haeseler, A. (2007). Mapping human genetic ancestry. Molecular Biology and Evolution, 24, 2266–76.CrossRefGoogle ScholarPubMed
Edgar, R. C. (2004). MUSCLE, multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–7.CrossRefGoogle ScholarPubMed
Edwards, E. J. and Smith, S. A. (2010). Phylogenetic analyses reveal the shady history of C-4 grasses. Proceedings of the National Academy of Sciences of the United States of America, 107, 2532–7.CrossRefGoogle Scholar
Fan, H. H. and Kubatko, L. S. (2011). Estimating species trees using approximate Bayesian computation. Molecular Phylogenetics and Evolution, 59, 354–63.CrossRefGoogle ScholarPubMed
Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology, 27, 401–10.CrossRefGoogle Scholar
Felsenstein, J. (2004). Inferring Phylogenies. Sunderland, MA, Sinauer Press.Google Scholar
Fletcher, W. and Yang, Z. H. (2009). INDELible, a flexible simulator of biological sequence evolution. Molecular Biology and Evolution, 26, 1879–88.CrossRefGoogle ScholarPubMed
Goldman, N. and Yang, Z. (2008). Introduction. Statistical and computational challenges in molecular phylogenetics and evolution. Philosophical Transactions of the Royal Society of London B – Biological Sciences, 363, 3889–92.CrossRefGoogle ScholarPubMed
Goloboff, P. A., Catalano, S. A., Mirande, J. M., et al. (2009). Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics, 25, 211–30.CrossRefGoogle Scholar
Goodman, M., Czelusniak, J., Moore, G. W., Romeroherrera, A. E. and Matsuda, G. (1979). Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology, 28, 132–63.CrossRefGoogle Scholar
Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies, assessing the performance of PhyML 3.0. Systematic Biology, 59, 307–21.CrossRefGoogle ScholarPubMed
Hejnol, A., Obst, M., Stamatakis, A., et al. (2009). Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B – Biological Sciences, 276, 4261–70.CrossRefGoogle ScholarPubMed
Izquierdo-Carrasco, F., Smith, S. A. and Stamatakis, A. (2011). Algorithms, data structures, and numerica for likelihood-based phylogenetic inference of huge trees. BMC Bioinformatics, 12, 470.CrossRefGoogle Scholar
Källersjö, M., Farris, J. S., Chase, M. W., et al. (1998). Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants. Plant Systematics and Evolution, 213, 259–87.CrossRefGoogle Scholar
Knowles, L. L. (2009). Estimating species trees, methods of phylogenetic analysis when there is incongruence across genes. Systematic Biology, 58, 463–7.CrossRefGoogle ScholarPubMed
Liu, K., Linder, C. R. and Warnow, T. (2012). RAxML and FastTree, comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One, 6, 11.Google Scholar
Liu, K., Warnow, T. J., Holder, M. T., Nelesen, S. M., Yu, J. Y., Stamatakis, A. P. and Linder, C. R. (2011). SATe-II, Very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Systematic Biology, 61, 90–106.Google ScholarPubMed
Liu, L. and Pearl, D. K. (2007). Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Systematic Biology, 56, 504–14.CrossRefGoogle ScholarPubMed
Liu, L., Xi, Z. X., Wu, S. Y., Davis, C. C. and Edwards, S. V. (2015). Estimating phylogenetic trees from genome-scale data. In Year in Evolutionary Biology, ed. Mousseau, T. A. and Fox, C. W.. Annals of the New York Academy of Sciences, 1360: 36–53.Google ScholarPubMed
Liu, L., Yu, L. L., Kubatko, L., Pearl, D. K. and Edwards, S. V. (2009). Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution, 53, 320–8.CrossRefGoogle ScholarPubMed
Liu, L., Yu, L. L. and Pearl, D. K. (2010). Maximum tree, a consistent estimator of the species tree. Journal of Mathematical Biology, 60, 95–106.CrossRefGoogle ScholarPubMed
Löytynoja, A. and Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science, 320, 1632–5.CrossRefGoogle ScholarPubMed
McGill, B. J. (2010). Towards a unification of unified theories of biodiversity. Ecology Letters, 13, 627–42.CrossRefGoogle ScholarPubMed
McMahon, M. M. and Sanderson, M. J. (2006). Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes. Systematic Biology, 55, 818–36.CrossRefGoogle ScholarPubMed
Mossel, E., Roch, S. and Sly, A. (2011). On the inference of large phylogenies with long branches, how long is too long?Bulletin of Mathematical Biology, 73, 1627–44.CrossRefGoogle Scholar
Mossel, E. and Steel, M. (2005). How much can evolved characters tell us about the tree that generated them? In Mathematics of Evolution and Phylogeny, ed. Gascuel, O.. New York, Oxford University Press.Google Scholar
Page, R. D. M. and Charleston, M. A. (1997). From gene to organismal phylogeny, reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution, 7, 231–40.CrossRefGoogle ScholarPubMed
Philippe, H., Snell, E., Bapteste, E., Lopez, P., Holland, P. and Casane, D. (2004). Phylogenomics of eukaryotes: Impact of missing data on large alignments. Molecular Biology and Evolution, 21, 1740–52.CrossRefGoogle ScholarPubMed
Piel, W. H., Donoghue, M. J. and Sanderson, M. J. (2002). TreeBASE, a database of phylogenetic knowledge. In To the Interoperable “Catalog of Life”, ed. Shimura, J., Wilson, K. L. and Gordon, D.. Tsukuba, Japan, National Institute for Environmental Studies; pp. 41–7.Google Scholar
Pollard, D. A., Iyer, V. N., Moses, A. M. and Eisen, M. B. (2006). Widespread discordance of gene trees with species tree in Drosophila: Evidence for incomplete lineage sorting. PLoS Genetics, 2, 1634–47.CrossRefGoogle ScholarPubMed
Price, M. N., Dehal, P. S. and Arkin, A. P. (2010). FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One, 5, e9490.CrossRefGoogle ScholarPubMed
Qiu, Y. L., Dombrovska, O., Lee, J., et al. (2005). Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. International Journal of Plant Sciences, 166, 815–42.CrossRefGoogle Scholar
Reineke, A. R., Bornberg-Bauer, E. and Gu, J. (2011). Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes. Nucleic Acids Research, 39, 6029–43.CrossRefGoogle ScholarPubMed
Roch, S. (2006). A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE-ACM Transactions on Computational Biology and Bioinformatics, 3, 92–4.CrossRefGoogle Scholar
Sanderson, M. J. (2008). Phylogenetic signal in the eukaryotic tree of life. Science, 321, 121–3.CrossRefGoogle ScholarPubMed
Sanderson, M. J., Boss, D., Chen, D., Cranston, K. A. and Wehe, A. (2008). The PhyLoTA browser, processing GenBank for molecular phylogenetics research. Systematic Biology, 57, 335–46.CrossRefGoogle ScholarPubMed
Sanderson, M. J. and McMahon, M. M. (2007). Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology, 7 (Suppl. 1), S3.CrossRefGoogle ScholarPubMed
Sanderson, M. J., McMahon, M. M. and Steel, M. (2010). Phylogenomics with incomplete taxon coverage, the limits to inference. BMC Evolutionary Biology, 10, 155.CrossRefGoogle ScholarPubMed
Sanderson, M. J., McMahon, M. M. and Steel, M. (2011). Terraces in phylogenetic tree space. Science, 333, 448–50.CrossRefGoogle ScholarPubMed
Sankoff, D., Zheng, C. F., Munoz, A., et al. (2010). Issues in the reconstruction of gene order evolution. Journal of Computer Science and Technology, 25, 10–25.CrossRefGoogle Scholar
Semple, C. and Steel, M. (2003). Phylogenetics. New York, Oxford University Press.Google Scholar
Smith, S. A., Beaulieu, J. M., Stamatakis, A. and Donoghue, M. J. (2011a) Understanding angiosperm diversification using small and large phylogenetic trees. American Journal of Botany, 98, 404–14.CrossRefGoogle ScholarPubMed
Smith, S. A., Wilson, N. G., Goetz, F. E., et al. (2011b). Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature, 480, 364–7.CrossRefGoogle ScholarPubMed
Soltis, D. E., Smith, S. A., Cellinese, N., et al. (2011). Angiosperm phylogeny, 17 genes, 640 taxa. American Journal of Botany, 98, 704–30.CrossRefGoogle ScholarPubMed
Soltis, D. E., Soltis, P. S., Chase, M. W., et al. (2000). Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Botanical Journal of the Linnean Society, 133, 381–461.CrossRefGoogle Scholar
Stamatakis, A., Hoover, P. and Rougemont, J. (2008). A rapid bootstrap algorithm for the RAxML web servers. Systematic Biology, 57, 758–71.CrossRefGoogle ScholarPubMed
Stamatakis, A. and Ott, M. (2008). Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures. Philosophical Transactions of the Royal Society of London B – Biological Sciences, 363, 3977–84.CrossRefGoogle ScholarPubMed
Steel, M. and Sanderson, M. J. (2010). Characterizing phylogenetically decisive taxon coverage. Applied Mathematics Letters, 23, 82–6.CrossRefGoogle Scholar
Tautz, D. and Domazet-Loso, T. (2011). The evolutionary origin of orphan genes. Nature Reviews Genetics, 12, 692–702.CrossRefGoogle ScholarPubMed
White, M. A., Ane, C., Dewey, C. N., Larget, B. R. and Payseur, B. A. (2009). Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genetics, 5, e1000729.CrossRefGoogle ScholarPubMed
Wiens, J. J. (2003). Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology, 52, 528–38.CrossRefGoogle ScholarPubMed
Wilkinson, M. (2003). Missing entries and multiple trees, instability, relationships and support in parsimony analysis. Journal of Vertebrate Paleontology, 23, 311–23.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×