Small Trees and Generalized Neighbor-Joining

doi:10.1017/CBO9780511610684.022

18 - Small Trees and Generalized Neighbor-Joining

from Part II - Studies on the four themes

Published online by Cambridge University Press: 04 August 2010

Mark Contois and

Dan Levy

Edited by

L. Pachter and

B. Sturmfels

Show author details

L. Pachter: Affiliation:
University of California, Berkeley
B. Sturmfels: Affiliation:
University of California, Berkeley

Book contents

Get access

Summary

Direct reconstruction of phylogenetic trees by maximum likelihood methods is computationally prohibitive for trees with many taxa; however, by computing all trees for subsets of taxa of size m, we can infer the entire tree. In particular, if m = 2, the traditional distance-based methods such as neighbor-joining [Saitou and Nei, 1987] and UPGMA [Sneath and Sokal, 1973] are applicable. Under distance-based methods, 2-leaf subtrees are completely determined by the total length between each pair of leaves. We extend this idea to m leaves by developing the notion of m-dissimilarity [Pachter and Speyer, 2004]. By building trees on subsets of size m of the taxa and rinding the total length, we can obtain an m-dissimilarity map. We will explain the generalized neighbor-joining (GNJ) algorithm [Levy et al., 2005] for obtaining a phylogenetic tree with edge lengths from an m-dissimilarity map.

This algorithm is consistent: given an m-dissimilarity map DT that comes from a tree T, GNJ returns the correct tree. However, in the case of data that is “noisy”, e.g., when the observed dissimilarity map does not lie in the space of trees, the accuracy of GNJ depends on the reliability of the subtree lengths. Numerical methods may run into trouble when models are of high degree (Section 1.3); exact methods for computing subtrees, therefore, could only serve to improve the accuracy of GNJ. One family of such methods consists of algorithms for finding critical points of the ML equations as discussed in Chapter 15 and in [Hoşten et al., 2005].

Information

Type: Chapter
Information: Algebraic Statistics for Computational Biology , pp. 335 - 346

DOI: https://doi.org/10.1017/CBO9780511610684.022 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2005

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.