Skip to main content Accessibility help
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 2
  • Cited by
    This chapter has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Buczyńska, Weronika 2012. Phylogenetic toric varieties on graphs. Journal of Algebraic Combinatorics, Vol. 35, Issue. 3, p. 421.

    Buczyńska, Weronika Buczyński, Jarosław Kubjas, Kaie and Michałek, Mateusz 2013. On the graph labellings arising from phylogenetics. Open Mathematics, Vol. 11, Issue. 9,

  • Print publication year: 2005
  • Online publication date: August 2010

1 - Statistics

from Part I - Introduction to the four themes


Statistics is the science of data analysis. The data to be encountered in this book are derived from genomes. Genomes consist of long chains of DNA which are represented by sequences in the letters A, C, G or T. These abbreviate the four nucleic acids Adenine, Cytosine, Guanine and Thymine, which serve as fundamental building blocks in molecular biology.

What do statisticians do with their data? They build models of the process that generated the data and, in what is known as statistical inference, draw conclusions about this process. Genome sequences are particularly interesting data to draw conclusions from: they are the blueprint for life, and yet their function, structure, and evolution are poorly understood. Statistical models are fundamental for genomics, a point of view that was emphasized in [Durbin et al., 1998].

The inference tools we present in this chapter look different from those found in [Durbin et al., 1998], or most other texts on computational biology or mathematical statistics: ours are written in the language of abstract algebra. The algebraic language for statistics clarifies many of the ideas central to the analysis of discrete data, and, within the context of biological sequence analysis, unifies the main ingredients of many widely used algorithms.

Algebraic Statistics is a new field, less than a decade old, whose precise scope is still emerging. The term itself was coined by Giovanni Pistone, Eva Riccomagno and Henry Wynn, with the title of their book [Pistone et al., 2000].

Recommend this book

Email your librarian or administrator to recommend adding this book to your organisation's collection.

Algebraic Statistics for Computational Biology
  • Online ISBN: 9780511610684
  • Book DOI:
Please enter your name
Please enter a valid email address
Who would you like to send this to *