The quantitative analysis of biological sequence data is based on methods from statistics coupled with efficient algorithms from computer science. Algebra provides a framework for unifying many of the seemingly disparate techniques used by computational biologists. This book, first published in 2005, offers an introduction to this mathematical framework and describes tools from computational algebra for designing new algorithms for exact, accurate results. These algorithms can be applied to biological problems such as aligning genomes, finding genes and constructing phylogenies. The first part of this book consists of four chapters on the themes of Statistics, Computation, Algebra and Biology, offering speedy, self-contained introductions to the emerging field of algebraic statistics and its applications to genomics. In the second part, the four themes are combined and developed to tackle real problems in computational genomics. As the first book in the exciting and dynamic area, it will be welcomed as a text for self-study or for advanced undergraduate and beginning graduate courses.

• First book in an exciting area at intersection of computation, statistics, and genomics • Has quick guides to background topics, then applies these in case studies at forefront of research • Includes links to online software and ancillary material from www.cambridge.org/9780521857000

### Contents

Preface; Part I. Introduction to the Four Themes: 1. Statistics L. Pachter and B. Sturmfels; 2. Computation L. Pachter and B. Sturmfels; 3. Algebra L. Pachter and B. Sturmfels; 4. Biology L. Pachter and B. Sturmfels; Part II. Studies on the Four Themes: 5. Parametric inference R. Mihaescu; 6. Polytope propagation on graphs M. Joswig; 7. Parametric sequence alignment C. Dewey and K. Woods; 8. Bounds for optimal sequence alignment S. Elizalde; 9. Inference functions S. Elizalde; 10. Geometry of Markov chains E. Kuo; 11. Equations defining hidden Markov models N. Bray and J. Morton; 12. The EM algorithm for hidden Markov models I. B. Hallgrímsdóttir, A. Milowski and J. Yu; 13. Homology mapping with Markov random fields A. Caspi; 14. Mutagenetic tree models N. Beerenwinkel and M. Drton; 15. Catalog of small trees M. Casanellas, L. Garcia and S. Sullivant; 16. The strand symmetric model M. Casanellas and S. Sullivant; 17. Extending statistical models from trees to splits graphs D. Bryant; 18. Small trees and generalized neighbor-joining M. Contois and D. Levy; 19. Tree construction using Singular Value Decomposition N. Eriksson; 20. Applications of interval methods to phylogenetics R. Sainudiin and R. Yoshida; 21. Analysis of point mutations in vertebrate genomes J. Al-Aidroos and S. Snir; 22. Ultra-conserved elements in vertebrate genomes M. Drton, N. Eriksson and G. Leung; Index.

### Reviews

'As the first book in this exciting and dynamic area, it will be welcomed as a text for self-study or for advanced undergraduate and beginning graduate courses.' L'enseignement mathematique

'… substantial, enthusiastically presented, and confidently written …' Publication of the International Statistical Institute

'This book is of great interest to research workers, teachers and students in applied statistics, biology, medicine and genetics.' Zentralblatt MATH