Search results for General statistics and probability

4 - Random Walks
Rick Durrett, Duke University, North Carolina
Book:

Probability

Published online:

05 June 2012

Print publication:

30 August 2010, pp 179-220
- Chapter
- Export citation

10 - Combination, Relevance, and Comparability
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 185-210
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Acknowledgments
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp xii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 258-263
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - False Discovery Rate Control
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 46-69
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Applied statistics is an inherently conservative enterprise, and appropriately so since the scientific world depends heavily on the consistent evaluation of evidence. Conservative consistency is raised to its highest level in classical significance testing, where the control of Type I error is enforced with an almost religious intensity. A p-value of 0.06 rather than 0.04 has decided the fate of entire pharmaceutical companies. Fisher's scale of evidence, Table 3.1, particularly the α = 0.05 threshold, has been used in literally millions of serious scientific studies, and stakes a good claim to being the 20th century's most influential piece of applied mathematics.
All of this makes it more than a little surprising that a powerful rival to Type I error control has emerged in the large-scale testing literature. Since its debut in Benjamini and Hochberg's seminal 1995 paper, false discovery rate control has claimed an increasing portion of statistical research, both applied and theoretical, and seems to have achieved “accepted methodology” status in scientific subject-matter journals.
False discovery rate control moves us away from the significance-testing algorithms of Chapter 3, back toward the empirical Bayes context of Chapter 2. The language of classical testing is often used to describe FDR methods (perhaps in this way assisting their stealthy infiltration of multiple testing practice), but, as the discussion here is intended to show, both their rationale and results are quite different.

3 - Significance Testing Algorithms
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 30-45
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Simultaneous hypothesis testing was a lively topic in the early 1960s, my graduate student years, and had been so since the end of World War II. Rupert Miller's book Simultaneous Statistical Inference appeared in 1966, providing a beautifully lucid summary of the contemporary methodology. A second edition in 1981 recorded only modest gains during the intervening years. This was a respite, not an end: a new burst of innovation in the late 1980s generated important techniques that we will be revisiting in this chapter.
Miller's book, which gives a balanced picture of the theory of that time, has three notable features:
It is overwhelmingly frequentist.
It is focused on control of α, the overall Type I error rate of a procedure.
It is aimed at multiple testing situations with individual cases N between 2 and, say, 10.
We have now entered a scientific age in which N = 10 000 is no cause for raised eyebrows. It is impressive (or worrisome) that the theory of the 1980s continues to play a central role in microarray-era statistical inference. Features 1 and 2 are still the norm in much of the multiple testing literature, despite the obsolescence of Feature 3. This chapter reviews part of that theory, particularly the ingenious algorithms that have been devised to control the overall Type I error rate (also known as FWER, the family-wise error rate).

7 - Estimation Accuracy
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 113-140
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Large-Scale Hypothesis Testing
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 15-29
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 251-257
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Correlation Questions
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 141-162
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Prediction and Effect Size Estimation
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 211-242
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix B - Data Sets and Programs
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 249-250
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix A - Exponential Families
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 243-248
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Empirical Bayes and the James—Stein Estimator
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 1-14
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Charles Stein shocked the statistical world in 1955 with his proof that maximum likelihood estimation methods for Gaussian models, in common use for more than a century, were inadmissible beyond simple one- or two-dimensional situations. These methods are still in use, for good reasons, but Stein-type estimators have pointed the way toward a radically different empirical Bayes approach to high-dimensional statistical inference. We will be using empirical Bayes ideas for estimation, testing, and prediction, beginning here with their path-breaking appearance in the James—Stein formulation.
Although the connection was not immediately recognized, Stein's work was half of an energetic post-war empirical Bayes initiative. The other half, explicitly named “empirical Bayes” by its principal developer Herbert Robbins, was less shocking but more general in scope, aiming to show how frequentists could achieve full Bayesian efficiency in large-scale parallel studies. Large-scale parallel studies were rare in the 1950s, however, and Robbins' theory did not have the applied impact of Stein's shrinkage estimators, which are useful in much smaller data sets.
All of this has changed in the 21st century. New scientific technologies, epitomized by the microarray, routinely produce studies of thousands of parallel cases — we will see several such studies in what follows — well-suited for the Robbins point of view. That view predominates in the succeeding chapters, though not explicitly invoking Robbins' methodology until the very last section of the book.

Frontmatter
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Sets of Cases (Enrichment)
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 163-184
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Microarray experiments, through a combination of insufficient data per gene and the difficulties of large-scale simultaneous inference, often yield disappointing results. In search of greater detection power, enrichment analysis considers the combined outcomes of biologically determined sets of genes, for example the set of all the genes in a predefined genetic pathway. If all 20 z-values in a hypothetical pathway were positive, we might assign significance to the pathway's effect, whether or not any of the individual zi were deemed non-null. We will consider enrichment methods in this chapter, and some of the theory, which of course applies just as well to similar situations outside the microarray context.
Our main example concerns the p53 data, partially illustrated in Figure 9.1; p53 is a transcription factor, that is, a gene that controls the activity of other genes. Mutations in p53 have been implicated in cancer development. A National Cancer Institute microarray study compared 33 mutated cell lines with 17 in which p53 was unmutated. There were N = 10 100 gene expressions measured for each cell line, yielding a 10 100 × 50 matrix X of expression measurements. Z-values based on two-sample t-tests were computed for each gene, as in (2.1)–(2.5), comparing mutated with unmutated cell lines. Figure 9.1 displays the 10 100 zi values.

Prologue
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp ix-xi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

At the risk of drastic oversimplification, the history of statistics as a recognized discipline can be divided into three eras:
The age of Quetelet and his successors, in which huge census-level data sets were brought to bear on simple but important questions: Are there more male than female births? Is the rate of insanity rising?
The classical period of Pearson, Fisher, Neyman, Hotelling, and their successors, intellectual giants who developed a theory of optimal inference capable of wringing every drop of information out of a scientific experiment. The questions dealt with still tended to be simple—Is treatment A better than treatment B? — but the new methods were suited to the kinds of small data sets individual scientists might collect.
The era of scientific mass production, in which new technologies typified by the microarray allow a single team of scientists to produce data sets of a size Quetelet would envy. But now the flood of data is accompanied by a deluge of questions, perhaps thousands of estimates or hypothesis tests that the statistician is charged with answering together; not at all what the classical masters had in mind.

6 - Theoretical, Permutation, and Empirical Null Distributions
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 89-112
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In classical significance testing, the null distribution plays the role of devil's advocate: a standard that the observed data must exceed in order to convince the scientific world that something interesting has occurred. We observe, say, z = 2, and note that in a hypothetical “long run” of observations from a N(0, 1) distribution less than 2.5% of the draws would exceed 2, thereby discrediting the uninteresting null distribution as an explanation.
Considerable effort has been expended trying to maintain the classical model in large-scale testing situations, as seen in Chapter 3, but there are important differences that affect the role of the null distribution when the number of cases N is large:
• With N = 10 000 for example, the statistician has his or her own “long run” in hand. This diminishes the importance of theoretical null calculations based on mathematical models. In particular, it may become clear that the classical null distribution appropriate for a single-test application is in fact wrong for the current situation.
• Scientific applications of single-test theory most often suppose, or hope for, rejection of the null hypothesis, perhaps with power = 0.80. Largescale studies are usually carried out with the expectation that most of the N cases will accept the null hypothesis, leaving only a small number of interesting prospects for more intensive investigation.
• Sharp null hypotheses, such as H0 : μ = 0 for z ˜ N(μ, 1), are less important in large-scale studies. […]

5 - Local False Discovery Rates
Bradley Efron, Stanford University, California
Book:

Large-Scale Inference

Published online:

05 September 2013

Print publication:

05 August 2010, pp 70-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

General statistics and probability

Refine search

Refine search

Actions for selected content:

2703 results in General statistics and probability

4 - Random Walks

10 - Combination, Relevance, and Comparability

Acknowledgments

Index

4 - False Discovery Rate Control

Summary

3 - Significance Testing Algorithms

Summary

7 - Estimation Accuracy

Contents

2 - Large-Scale Hypothesis Testing

References

8 - Correlation Questions

11 - Prediction and Effect Size Estimation

Appendix B - Data Sets and Programs

Appendix A - Exponential Families

1 - Empirical Bayes and the James—Stein Estimator

Summary

Frontmatter

9 - Sets of Cases (Enrichment)

Summary

Prologue

Summary

6 - Theoretical, Permutation, and Empirical Null Distributions

Summary

5 - Local False Discovery Rates

General statistics and probability

Refine search

Refine search

Actions for selected content:

Save Search

2703 results in General statistics and probability

Summary

Summary

Summary

Summary

Summary

Summary