Search results for Statistical theory and methods

Contents
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Large-Scale Hypothesis Testing
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 15-29
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 251-257
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Correlation Questions
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 141-162
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Prediction and Effect Size Estimation
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 211-242
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix B - Data Sets and Programs
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 249-250
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix A - Exponential Families
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 243-248
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Empirical Bayes and the James—Stein Estimator
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 1-14
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Charles Stein shocked the statistical world in 1955 with his proof that maximum likelihood estimation methods for Gaussian models, in common use for more than a century, were inadmissible beyond simple one- or two-dimensional situations. These methods are still in use, for good reasons, but Stein-type estimators have pointed the way toward a radically different empirical Bayes approach to high-dimensional statistical inference. We will be using empirical Bayes ideas for estimation, testing, and prediction, beginning here with their path-breaking appearance in the James—Stein formulation.
Although the connection was not immediately recognized, Stein's work was half of an energetic post-war empirical Bayes initiative. The other half, explicitly named “empirical Bayes” by its principal developer Herbert Robbins, was less shocking but more general in scope, aiming to show how frequentists could achieve full Bayesian efficiency in large-scale parallel studies. Large-scale parallel studies were rare in the 1950s, however, and Robbins' theory did not have the applied impact of Stein's shrinkage estimators, which are useful in much smaller data sets.
All of this has changed in the 21st century. New scientific technologies, epitomized by the microarray, routinely produce studies of thousands of parallel cases — we will see several such studies in what follows — well-suited for the Robbins point of view. That view predominates in the succeeding chapters, though not explicitly invoking Robbins' methodology until the very last section of the book.

Frontmatter
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Sets of Cases (Enrichment)
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 163-184
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Microarray experiments, through a combination of insufficient data per gene and the difficulties of large-scale simultaneous inference, often yield disappointing results. In search of greater detection power, enrichment analysis considers the combined outcomes of biologically determined sets of genes, for example the set of all the genes in a predefined genetic pathway. If all 20 z-values in a hypothetical pathway were positive, we might assign significance to the pathway's effect, whether or not any of the individual zi were deemed non-null. We will consider enrichment methods in this chapter, and some of the theory, which of course applies just as well to similar situations outside the microarray context.
Our main example concerns the p53 data, partially illustrated in Figure 9.1; p53 is a transcription factor, that is, a gene that controls the activity of other genes. Mutations in p53 have been implicated in cancer development. A National Cancer Institute microarray study compared 33 mutated cell lines with 17 in which p53 was unmutated. There were N = 10 100 gene expressions measured for each cell line, yielding a 10 100 × 50 matrix X of expression measurements. Z-values based on two-sample t-tests were computed for each gene, as in (2.1)–(2.5), comparing mutated with unmutated cell lines. Figure 9.1 displays the 10 100 zi values.

Prologue
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp ix-xi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

At the risk of drastic oversimplification, the history of statistics as a recognized discipline can be divided into three eras:
The age of Quetelet and his successors, in which huge census-level data sets were brought to bear on simple but important questions: Are there more male than female births? Is the rate of insanity rising?
The classical period of Pearson, Fisher, Neyman, Hotelling, and their successors, intellectual giants who developed a theory of optimal inference capable of wringing every drop of information out of a scientific experiment. The questions dealt with still tended to be simple—Is treatment A better than treatment B? — but the new methods were suited to the kinds of small data sets individual scientists might collect.
The era of scientific mass production, in which new technologies typified by the microarray allow a single team of scientists to produce data sets of a size Quetelet would envy. But now the flood of data is accompanied by a deluge of questions, perhaps thousands of estimates or hypothesis tests that the statistician is charged with answering together; not at all what the classical masters had in mind.

6 - Theoretical, Permutation, and Empirical Null Distributions
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 89-112
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In classical significance testing, the null distribution plays the role of devil's advocate: a standard that the observed data must exceed in order to convince the scientific world that something interesting has occurred. We observe, say, z = 2, and note that in a hypothetical “long run” of observations from a N(0, 1) distribution less than 2.5% of the draws would exceed 2, thereby discrediting the uninteresting null distribution as an explanation.
Considerable effort has been expended trying to maintain the classical model in large-scale testing situations, as seen in Chapter 3, but there are important differences that affect the role of the null distribution when the number of cases N is large:
• With N = 10 000 for example, the statistician has his or her own “long run” in hand. This diminishes the importance of theoretical null calculations based on mathematical models. In particular, it may become clear that the classical null distribution appropriate for a single-test application is in fact wrong for the current situation.
• Scientific applications of single-test theory most often suppose, or hope for, rejection of the null hypothesis, perhaps with power = 0.80. Largescale studies are usually carried out with the expectation that most of the N cases will accept the null hypothesis, leaving only a small number of interesting prospects for more intensive investigation.
• Sharp null hypotheses, such as H0 : μ = 0 for z ˜ N(μ, 1), are less important in large-scale studies. […]

5 - Local False Discovery Rates
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 70-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Algebraic Statistics for Computational Biology

Edited by L. Pachter, B. Sturmfels
Published online:

04 August 2010

Print publication:

22 August 2005
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The quantitative analysis of biological sequence data is based on methods from statistics coupled with efficient algorithms from computer science. Algebra provides a framework for unifying many of the seemingly disparate techniques used by computational biologists. This book, first published in 2005, offers an introduction to this mathematical framework and describes tools from computational algebra for designing new algorithms for exact, accurate results. These algorithms can be applied to biological problems such as aligning genomes, finding genes and constructing phylogenies. The first part of this book consists of four chapters on the themes of Statistics, Computation, Algebra and Biology, offering speedy, self-contained introductions to the emerging field of algebraic statistics and its applications to genomics. In the second part, the four themes are combined and developed to tackle real problems in computational genomics. As the first book in the exciting and dynamic area, it will be welcomed as a text for self-study or for advanced undergraduate and beginning graduate courses.

Essentials of Statistical Inference

G. A. Young, R. L. Smith
Published online:

06 July 2010

Print publication:

25 July 2005
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Aimed at advanced undergraduate and graduate students in mathematics and related disciplines, this book presents the concepts and results underlying the Bayesian, frequentist and Fisherian approaches, with particular emphasis on the contrasts between them. Computational ideas are explained, as well as basic mathematical theory. Written in a lucid and informal style, this concise text provides both basic material on the main approaches to inference, as well as more advanced material on developments in statistical theory, including: material on Bayesian computation, such as MCMC, higher-order likelihood theory, predictive inference, bootstrap methods and conditional inference. It contains numerous extended examples of the application of formal inference techniques to real data, as well as historical commentary on the development of the subject. Throughout, the text concentrates on concepts, rather than mathematical detail, while maintaining appropriate levels of formality. Each chapter ends with a set of accessible problems.

Semiparametric Regression

David Ruppert, M. P. Wand, R. J. Carroll
Published online:

06 July 2010

Print publication:

14 July 2003
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Semiparametric regression is concerned with the flexible incorporation of non-linear functional relationships in regression analyses. Any application area that benefits from regression analysis can also benefit from semiparametric regression. Assuming only a basic familiarity with ordinary parametric regression, this user-friendly book explains the techniques and benefits of semiparametric regression in a concise and modular fashion. The authors make liberal use of graphics and examples plus case studies taken from environmental, financial, and other applications. They include practical advice on implementation and pointers to relevant software. The 2003 book is suitable as a textbook for students with little background in regression as well as a reference book for statistically oriented scientists such as biostatisticians, econometricians, quantitative social scientists, epidemiologists, with a good working knowledge of regression and the desire to begin using more flexible semiparametric models. Even experts on semiparametric regression should find something new here.

Quantile Regression

Roger Koenker
Published online:

06 July 2010

Print publication:

05 May 2005
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Quantile regression is gradually emerging as a unified statistical methodology for estimating models of conditional quantile functions. By complementing the exclusive focus of classical least squares regression on the conditional mean, quantile regression offers a systematic strategy for examining how covariates influence the location, scale and shape of the entire response distribution. This monograph is the first comprehensive treatment of the subject, encompassing models that are linear and nonlinear, parametric and nonparametric. The author has devoted more than 25 years of research to this topic. The methods in the analysis are illustrated with a variety of applications from economics, biology, ecology and finance. The treatment will find its core audiences in econometrics, statistics, and applied mathematics in addition to the disciplines cited above.

Symmetry Studies

An Introduction to the Analysis of Structured Data in Applications
Marlos A. G. Viana
Published online:

06 July 2010

Print publication:

09 June 2008
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Experimental data can often be associated with or indexed by certain symmetrically interesting structures or sets of labels that appear, for example, in the study of short symbolic sequences in molecular biology, in preference or voting data, in (corneal) curvature data, and in studies of the handedness and entropy of symbolic sequences and elementary images. The symmetry studies introduced in this book describe the interplay among symmetry transformations that are characteristic of these sets of labels, their resulting classification, the algebraic decomposition of the data indexed by them, and the statistical analysis of the invariants induced by those decompositions. The overall purpose is to facilitate and guide the statistical study of the structured data from both a descriptive and inferential perspective. The text combines notions of algebra and statistics and develops a systematic methodology to better explore the interplay between symmetry-related research questions and their statistical analysis.

Algebraic and Geometric Methods in Statistics

Edited by Paolo Gibilisco, Eva Riccomagno, Maria Piera Rogantin, Henry P. Wynn
Published online:

27 May 2010

Print publication:

22 October 2009
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This up-to-date account of algebraic statistics and information geometry explores the emerging connections between the two disciplines, demonstrating how they can be used in design of experiments and how they benefit our understanding of statistical models, in particular, exponential models. This book presents a new way of approaching classical statistical problems and raises scientific questions that would never have been considered without the interaction of these two disciplines. Beginning with a brief introduction to each area, using simple illustrative examples, the book then proceeds with a collection of reviews and some new results written by leading researchers in their respective fields. Part III dwells in both classical and quantum information geometry, containing surveys of key results and new material. Finally, Part IV provides examples of the interplay between algebraic statistics and information geometry. Computer code and proofs are also available online, where key examples are developed in further detail.

Multivariate T-Distributions and Their Applications

Samuel Kotz, Saralees Nadarajah
Published online:

04 May 2010

Print publication:

16 February 2004
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Almost all the results available in the literature on multivariate t-distributions published in the last 50 years are now collected together in this comprehensive reference. Because these distributions are becoming more prominent in many applications, this book is a must for any serious researcher or consultant working in multivariate analysis and statistical distributions. Much of this material has never before appeared in book form. The first part of the book emphasizes theoretical results of a probabilistic nature. In the second part of the book, these are supplemented by a variety of statistical aspects. Various generalizations and applications are dealt with in the final chapters. The material on estimation and regression models is of special value for practitioners in statistics and economics. A comprehensive bibliography of over 350 references is included.

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

2326 results in Statistical theory and methods

Contents

2 - Large-Scale Hypothesis Testing

References

8 - Correlation Questions

11 - Prediction and Effect Size Estimation

Appendix B - Data Sets and Programs

Appendix A - Exponential Families

1 - Empirical Bayes and the James—Stein Estimator

Summary

Frontmatter

9 - Sets of Cases (Enrichment)

Summary

Prologue

Summary

6 - Theoretical, Permutation, and Empirical Null Distributions

Summary

5 - Local False Discovery Rates

Algebraic Statistics for Computational Biology

Essentials of Statistical Inference

Semiparametric Regression

Quantile Regression

Symmetry Studies

Algebraic and Geometric Methods in Statistics

Multivariate T-Distributions and Their Applications

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

Save Search

2326 results in Statistical theory and methods

Summary

Summary

Summary

Summary

Algebraic Statistics for Computational Biology

Essentials of Statistical Inference

Semiparametric Regression

Quantile Regression

Symmetry Studies

Algebraic and Geometric Methods in Statistics

Multivariate T-Distributions and Their Applications