Book contents
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
18 - Small Trees and Generalized Neighbor-Joining
from Part II - Studies on the four themes
Published online by Cambridge University Press: 04 August 2010
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
Summary
Direct reconstruction of phylogenetic trees by maximum likelihood methods is computationally prohibitive for trees with many taxa; however, by computing all trees for subsets of taxa of size m, we can infer the entire tree. In particular, if m = 2, the traditional distance-based methods such as neighbor-joining [Saitou and Nei, 1987] and UPGMA [Sneath and Sokal, 1973] are applicable. Under distance-based methods, 2-leaf subtrees are completely determined by the total length between each pair of leaves. We extend this idea to m leaves by developing the notion of m-dissimilarity [Pachter and Speyer, 2004]. By building trees on subsets of size m of the taxa and rinding the total length, we can obtain an m-dissimilarity map. We will explain the generalized neighbor-joining (GNJ) algorithm [Levy et al., 2005] for obtaining a phylogenetic tree with edge lengths from an m-dissimilarity map.
This algorithm is consistent: given an m-dissimilarity map DT that comes from a tree T, GNJ returns the correct tree. However, in the case of data that is “noisy”, e.g., when the observed dissimilarity map does not lie in the space of trees, the accuracy of GNJ depends on the reliability of the subtree lengths. Numerical methods may run into trouble when models are of high degree (Section 1.3); exact methods for computing subtrees, therefore, could only serve to improve the accuracy of GNJ. One family of such methods consists of algorithms for finding critical points of the ML equations as discussed in Chapter 15 and in [Hoşten et al., 2005].
- Type
- Chapter
- Information
- Algebraic Statistics for Computational Biology , pp. 335 - 346Publisher: Cambridge University PressPrint publication year: 2005
- 2
- Cited by