Search results for Statistical theory and methods

10 - Geometry of Markov Chains
from Part II - Studies on the four themes
- By Eric Kuo
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 226-236
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Acknowledgment of support
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp xii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Biology
from Part I - Introduction to the four themes
- By Lior Pachter, Bernd Sturmfels
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 125-160
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter describes genome sequence data and explains the relevance of the statistics, computation and algebra that we have discussed in Chapters 1–3 to understanding the function of genomes and their evolution. It sets the stage for the studies in biological sequence analysis in some of the later chapters.
Given that quantitative methods play an increasingly important role in many different aspects of biology, the question arises: why the emphasis on genome sequences? The most significant answer is that genomes are fundamental objects that carry instructions for the self-assembly of living organisms. Ultimately, our understanding of human biology will be based on an understanding of the organization and function of our genome. Another reason to focus on genomes is the abundance of high fidelity data. Current finished genome sequences have less than one error in 10,000 bases. Statistical methods can therefore be directly applied to modeling the random evolution of genomes and to making inferences about the structure and organization of functional elements; there is no need to worry about extracting signal from noisy data. Furthermore, it is possible to validate findings with laboratory experiments.
The rate of accumulation of genome sequence data has been extraordinary, far outpacing Moore's law for the increasing density of transistors on circuit chips. This is due to breakthroughs in sequencing technologies and radical advances in automation. Since the first completion of the genome of a free living organism in 1995 (Haemophilus Influenza [Fleischmann et al., 1995]), biologists have completely sequenced over 200 microbial genomes, and dozens of complete invertebrate and vertebrate genomes.

12 - Matrix inequalities
Karim M. Abadir, University of York, Jan R. Magnus, Universiteit van Tilburg
Book:

Matrix Algebra

Published online:

05 June 2012

Print publication:

22 August 2005, pp 321-350
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Karim M. Abadir, University of York, Jan R. Magnus, Universiteit van Tilburg
Book:

Matrix Algebra

Published online:

05 June 2012

Print publication:

22 August 2005, pp xxix-xxx
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This volume on matrix algebra and its companion volume on statistics are the first two volumes of the Econometric Exercises Series. The two books contain exercises in matrix algebra, probability, and statistics, relating to course material that students are expected to know while enrolled in an (advanced) undergraduate or a postgraduate course in econometrics.
When we started writing this volume, our aim was to provide a collection of interesting exercises with complete and rigorous solutions. In fact, we wrote the book that we — as students — would have liked to have had. Our intention was not to write a textbook, but to supply material that could be used together with a textbook. But when the volume developed we discovered that we did in fact write a textbook, be it one organized in a completely different manner. Thus, we do provide and prove theorems in this volume, because continually referring to other texts seemed undesirable. The volume can thus be used either as a self-contained course in matrix algebra or as a supplementary text.
We have attempted to develop new ideas slowly and carefully. The important ideas are introduced algebraically and sometimes geometrically, but also through examples. It is our experience that most students find it easier to assimilate the material through examples rather than by the theoretical development only.

Guide to the chapters
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp xi-xi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Inference Functions
from Part II - Studies on the four themes
- By Sergi Elizalde
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 215-225
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Some of the statistical models introduced in Chapter 1 have the feature that, aside from the observed data, there is hidden information that cannot be determined from an observation. In this chapter we consider graphical models with hidden variables, such as the hidden Markov model and the hidden tree model. A natural problem in such models is to determine, given a particular observation, what is the most likely hidden data (which is called the explanation) for that observation. This problem is called MAP inference (Remark 4.13). Any fixed values of the parameters determine a way to assign an explanation to each possible observation. A map obtained in this way is called an inference function.
Examples of inference functions include gene-finding functions which were discussed in [Pachter and Sturmfels, 2005, Section 5]. These inference functions of a hidden Markov model are used to identify gene structures in DNA sequences (see Section 4.4. An observation in such a model is a sequence over the alphabet Σ′ = {A, C, G, T}.
After a short introduction to inference functions, we present the main result of this chapter in Section 9.2. We call it the Few Inference Functions Theorem, and it states that in any graphical model the number of inference functions grows polynomially if the number of parameters is fixed. This theorem shows that most functions from the set of observations to possible values of the hidden data cannot be inference functions for any choice of the model parameters.

17 - Extending Tree Models to Splits Networks
from Part II - Studies on the four themes
- By David Bryant
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 322-334
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Karim M. Abadir, University of York, Jan R. Magnus, Universiteit van Tilburg
Book:

Matrix Algebra

Published online:

05 June 2012

Print publication:

22 August 2005, pp vii-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Vectors
Karim M. Abadir, University of York, Jan R. Magnus, Universiteit van Tilburg
Book:

Matrix Algebra

Published online:

05 June 2012

Print publication:

22 August 2005, pp 1-14
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

19 - Tree Construction using Singular Value Decomposition
from Part II - Studies on the four themes
- By Nicholas Eriksson
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 347-358
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algebraic theory of statistical models (as developed in Chapters 1 and 3). Our basic tool is Singular Value Decomposition (SVD) from numerical linear algebra.
Starting with an alignment of n DNA sequences, we show that SVD allows us to quickly decide whether a split of the taxa occurs in their phylogenetic tree, assuming only that evolution follows a tree Markov model. Using this fact, we have developed an algorithm to construct a phylogenetic tree by computing only O(n2) SVDs.
We have implemented this algorithm using the SVDLIBC library (available at http://tedlab.mit.edu/~dr/SVDLIBC/) and have done extensive testing with simulated and real data. The algorithm is fast in practice on trees with 20–30 taxa.
We begin by describing the general Markov model and then show how to flatten the joint probability distribution along a partition of the leaves in the tree. We give rank conditions for the resulting matrix; most notably, we give a set of new rank conditions that are satisfied by non-splits in the tree. Armed with these rank conditions, we present the tree-building algorithm, using SVD to calculate how close a matrix is to a certain rank. Finally, we give experimental results on the behavior of the algorithm with both simulated and real-life (ENCODE) data.
The general Markov model
We assume that evolution follows a tree Markov model, as introduced in Section 1.4, with evolution acting independently at different sites of the genome.

5 - Parametric Inference
from Part II - Studies on the four themes
- By Radu Mihaescu
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 165-180
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Graphical models are powerful statistical tools that have been applied to a wide variety of problems in computational biology: sequence alignment, ancestral genome reconstruction, etc. A graphical model consists of a graph whose vertices have associated random variables representing biological objects, such as entries in a DNA sequence, and whose edges have associated parameters that model transition or dependence relations between the random variables at the nodes. In many cases we will know the contents of only a subset of the model vertices, the observed random variables, and nothing about the contents of the remaining ones, the hidden random variables. A common example is a phylogenetic tree on a set of current species with given DNA sequences, but with no information about the DNA of their extinct ancestors. The task of finding the most likely set of values of the hidden random variables (also known as the explanation) given the set of observed random variables and the model parameters, is known as inference in graphical models.
Clearly, inference drawn about the hidden data is highly dependent on the topology and parameters (transition probabilities) of the graphical model. The topology of the model will be determined by the biological process being modeled, while the assumptions one can make about the nature of evolution, site mutation and other biological phenomena, allow us to restrict the space of possible transition probabilities to certain parameterized families. This raises several questions.

21 - Analysis of Point Mutations in Vertebrate Genomes
from Part II - Studies on the four themes
- By Jameel Al-Aidroos, Sagi Snir
Edited by L. Pachter, University of California, Berkeley, B. Sturmfels, University of California, Berkeley
Book:

Algebraic Statistics for Computational Biology

Published online:

04 August 2010

Print publication:

22 August 2005, pp 375-386
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Using homologous sequences from eight vertebrates, we present a concrete example of the estimation of mutation rates in the models of evolution introduced in Chapter 4. We detail the process of data selection from a multiple alignment of the ENCODE regions, and compare rate estimates for each of the models in the Felsenstein hierarchy of Figure 4.7. We also address a standing problem in vertebrate evolution, namely the resolution of the phylogeny of the Eutherian orders, and discuss several challenges of molecular sequence analysis in inferring the phylogeny of this subclass. In particular, we consider the question of the position of the rodents relative to the primates, carnivores and artiodactyls; we affectionately dub this question the rodent problem.
Estimating mutation rates
Given an alignment of sequence homologs from various taxa, and an evolutionary model from Section 4.5, we are naturally led to ask the question, “what tree (with what branch lengths) and what values of the parameters in the rate matrix for that model are suggested by the alignment?” One answer to this question, the so-called maximum-likelihood solution, is, “the tree and rate parameters which maximize the probability that the given alignment would be generated by the given model.” (See also Sections 1.3 and 3.3.)
There are a number of available software packages which attempt to find, to varying degrees, this maximum-likelihood solution. For example, for a few of the most restrictive models in the Felsenstein hierarchy, the package PHYLIP [Felsenstein, 2004] will very efficiently search the tree space for the maximum-likelihood tree and rate parameters.

3 - Vector spaces
Karim M. Abadir, University of York, Jan R. Magnus, Universiteit van Tilburg
Book:

Matrix Algebra

Published online:

05 June 2012

Print publication:

22 August 2005, pp 43-72
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter is the most abstract of the book. You may skip it at first reading, and jump directly to Chapter 4. But make sure you return to it later. Matrix theory can be viewed from an algebraic viewpoint or from a geometric viewpoint — both are equally important. The theory of vector spaces is essential in understanding the geometric viewpoint.
Associated with every vector space is a set of scalars, used to define scalar multiplication on the space. In the most abstract setting these scalars are required only to be elements of an algebraic field. We shall, however, always take the scalars to be the set of complex numbers (complex vector space) or, as an important special case, the set of real numbers (real vector space).
A vector space (or linear space) V is a nonempty set of elements (called vectors) together with two operations and a set of axioms. The first operation is addition, which associates with any two vectors x, y ∈ V a vector x + y ∈ V (the sum of x and y). The second operation is scalar multiplication, which associates with any vector x ∈ V and any real (or complex) scalar α, a vector αx ∈ V. It is the scalar (rather than the vectors) that determines whether the space is real or complex.

12 - Central Limit Theorems
Thomas A. Severini, Northwestern University, Illinois
Book:

Elements of Distribution Theory

Published online:

27 October 2009

Print publication:

08 August 2005, pp 365-399
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Normal Distribution Theory
Thomas A. Severini, Northwestern University, Illinois
Book:

Elements of Distribution Theory

Published online:

27 October 2009

Print publication:

08 August 2005, pp 235-256
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
Thomas A. Severini, Northwestern University, Illinois
Book:

Elements of Distribution Theory

Published online:

27 October 2009

Print publication:

08 August 2005, pp 503-506
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Stochastic Processes
Thomas A. Severini, Northwestern University, Illinois
Book:

Elements of Distribution Theory

Published online:

27 October 2009

Print publication:

08 August 2005, pp 170-198
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Matrix Algebra
David Freedman, University of California, Berkeley
Book:

Statistical Models

Published online:

05 June 2012

Print publication:

08 August 2005, pp 29-40
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Frontmatter
David Freedman, University of California, Berkeley
Book:

Statistical Models

Published online:

05 June 2012

Print publication:

08 August 2005, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

2326 results in Statistical theory and methods

10 - Geometry of Markov Chains

Acknowledgment of support

4 - Biology

Summary

12 - Matrix inequalities

Preface

Summary

Guide to the chapters

9 - Inference Functions

Summary

17 - Extending Tree Models to Splits Networks

Contents

1 - Vectors

19 - Tree Construction using Singular Value Decomposition

Summary

5 - Parametric Inference

Summary

21 - Analysis of Point Mutations in Vertebrate Genomes

Summary

3 - Vector spaces

Summary

12 - Central Limit Theorems

8 - Normal Distribution Theory

References

6 - Stochastic Processes

3 - Matrix Algebra

Frontmatter

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

Save Search

2326 results in Statistical theory and methods

Summary

Summary

Summary

Summary

Summary

Summary

Summary