To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Abstract. Model diagnostics are shown to have little power unless alternative hypotheses can be narrowly defined. For example, independence of observations cannot be tested against general forms of dependence. Thus, the basic assumptions in regression models cannot be inferred from the data. Equally, the proportionality assumption in proportional-hazards models is not testable. Specification error is a primary source of uncertainty in forecasting, and this uncertainty will be difficult to resolve without external calibration. Model-based causal inference is even more problematic.
Introduction
The object here is to sketch a demonstration that, unless additional regularity conditions are imposed, model diagnostics have power only against a circumscribed class of alternative hypotheses. The chapter is organized around the familiar requirements of statistical models. Theorems 1 and 2, for example, consider the hypothesis that distributions are continuous and have densities. According to the theorems, such hypotheses cannot be tested without additional structure.
Let us agree, then, that distributions are smooth. Can we test independence? Theorems 3 and 4 indicate the difficulty. Next, we grant independence and consider tests that distinguish between (i) independent and identically distributed random variables on the one hand, and (ii) independent but differently distributed variables on the other. Theorem 5 shows that, in general, power is lacking.
For ease of exposition, we present results for the unit interval; transformation to the positive half-line or the whole real line is easy.
In fields such as biology, medical sciences, sociology, and economics researchers often face the situation where the number of available observations, or the amount of available information, is sufficiently small that approximations based on the normal distribution may be unreliable. Theoretical work over the last quarter-century has led to new likelihood-based methods that lead to very accurate approximations in finite samples, but this work has had limited impact on statistical practice. This book illustrates by means of realistic examples and case studies how to use the new theory, and investigates how and when it makes a difference to the resulting inference. The treatment is oriented towards practice and comes with code in the R language (available from the web) which enables the methods to be applied in a range of situations of interest to practitioners. The analysis includes some comparisons of higher order likelihood inference with bootstrap or Bayesian methods.
Association schemes are of interest to both mathematicians and statisticians and this book was written with both audiences in mind. For statisticians, it shows how to construct designs for experiments in blocks, how to compare such designs, and how to analyse data from them. The reader is only assumed to know very basic abstract algebra. For pure mathematicians, it tells why association schemes are important and develops the theory to the level of advanced research. This book arose from a course successfully taught by the author and as such the material is thoroughly class-tested. There are a great number of examples and exercises that will increase the book's appeal to both graduate students and their instructors. It is ideal for those coming either from pure mathematics or statistics backgrounds who wish to develop their understanding of association schemes.
Students and investigators working in statistics, biostatistics, or applied statistics, in general, are constantly exposed to problems that involve large quantities of data. This is even more evident today, when massive datasets with an impressive amount of details are produced in novel fields such as genomics or bioinformatics at large. Because, in such a context, exact statistical inference may be computationally out of reach and in many cases not even mathematically tractable, they have to rely on approximate results. Traditionally, the justification for these approximations was based on the convergence of the first four moments of the distributions of the statistics under investigation to those of some normal distribution. Today we know that such an approach is not always theoretically adequate and that a somewhat more sophisticated set of techniques based on asymptotic considerations may provide the appropriate justification. This need for more profound mathematical theory in statistical large-sample theory is patent in areas involving dependent sequences of observations, such as longitudinal and survival data or life tables, in which the use of martingale or related structures has distinct advantages.
Unfortunately, most of the technical background for understanding such methods is dealt with in specific articles or textbooks written for a readership with such a high level of mathematical knowledge that they exclude a great portion of the potential users. We tried to bridge this gap in a previous text (Sen and Singer [1993]: Large Sample Methods in Statistics: An Introduction with Applications), on which our new enterprise is based.
Statistical estimation as well as hypothesis testing may be viewed as important topics of a more general (and admittedly, more abstract) statistical decision theory (SDT). Having genesis in the theory of games and affinity with Bayes methods, SDT has been continuously fortified with sophisticated mathematical tools as well as with philosophical justifications. In conformity with the general objectives and contended intermediate level of this monograph, we intend to provide an overall introduction to the general principles of SDT with some emphasis on Bayes methodology (as well as some of its variants), avoiding the usual philosophical deliberations and mathematical sophistication, to the extent possible. See Berger (1993) for a detailed exposition.
The connection between estimation and hypothesis testing theories treated in the preceding chapters and SDT relates to the uncertainty of statistical conclusions or decisions based on observed data sets and to the adequate provision for quantifying the frequency of incorrect ones. This has generated the notion of loss and risk functions that form the foundation of SDT. This notion serves as a building block for the formulation of minimum risk and minimax (risk) estimation theory, where Bayes estimates have a focal stand. In the same vein, Bayes tests, which are not necessarily isomorphic to the Neyman–Pearson–Wald likelihood-based tests have cropped up in SDT. In either case, the basic difference comes from the concepts of prior and posterior distributions that bring in more room for subjective judgement in the inferential process.
In general, categorical data models relate to count data corresponding to the classification of sampling units into groups or categories either on a qualitative or some quantitative basis. These categories may be defined by the essentially discrete nature of the phenomenon under study (see Example 1.2.11 dealing with the OAB blood classification model) or, often for practical reasons, by the grouping of the values of an essentially continuous underlying distribution (e.g., shoe sizes: 5, 5½, 6, 6½, etc. corresponding to half-open intervals for the actual length of a foot). Even in the qualitative case there is often an implicit ordering in the categories resulting in ordered categorical data (i.e., ratings: excellent, very good, good, fair, and poor, for a research proposal under review). Except in some of the most simple cases, exact statistical analysis for categorical data models may not be available in a unified, simple form. Hence, asymptotic methods are important in this context. They not only provide a unified coverage of statistical methodology appropriate for large sample sizes but also suggest suitable modifications, which may often be appropriate for moderate to small sample sizes. This chapter is devoted to the study of this related asymptotic theory.
Although there are a few competing probabilistic models for statistical analysis of categorical data sets, we will find it convenient to concentrate on the product multinomial model, which encompasses a broad domain and plays a key role in the development of appropriate statistical analysis tools.
This book should be on the shelf of every practising statistician who designs experiments. Good design considers units and treatments first, and then allocates treatments to units. It does not choose from a menu of named designs. This approach requires a notation for units that does not depend on the treatments applied. Most structure on the set of observational units, or on the set of treatments, can be defined by factors. This book develops a coherent framework for thinking about factors and their relationships, including the use of Hasse diagrams. These are used to elucidate structure, calculate degrees of freedom and allocate treatment subspaces to appropriate strata. Based on a one-term course the author has taught since 1989, the book is ideal for advanced undergraduate and beginning graduate courses. Examples, exercises and discussion questions are drawn from a wide range of real applications: from drug development, to agriculture, to manufacturing.
The general asymptotics presented in Chapter 6 and 7 play an important role in large sample methods in statistical inference. Yet, there are some asymptotic methods in statistical inference that rest on more sophisticated probability tools developed mostly in the past forty years, albeit most of these sophisticated tools are beyond the reach of the contemplated level of this text. We therefore take an intermediate route, introduce the basic concepts in these weak convergence approaches in less abstract ways, and emphasize on their fruitful statistical use. Among the stochastic processes entering our contemplated coverage, there are a few ones that are most commonly used in statistical inference:
(i) Partial sum processes;
(ii) empirical distributional processes; and
(iii) statistical functionals.
In order to treat them in a unified way, there is a need to explain how the weak convergence results in Chapter 7 have led to more abstract but impactful weak invariance principles (Section 11.2). Section 11.3 deals with partial sum processes, and Section 11.4 with empirical processes. Section 11.5 is devoted to statistical functionals. Some applications of weak invariance principles are considered in Section 11.6. In particular, some resampling plans (e.g., jackknifing and bootstraping) are appraised in the light of these weak invariance principles the embedding of the Wiener process is outlined in Section 11.7. Some general remarks are presented in the last section.
Testing statistical hypotheses, a dual problem to estimation, has the prime objective of making decisions about some population characteristic(s) with information obtained from sample data. A statistical hypothesis is a statement regarding a target distribution or some parameters associated with it, the tenacity of which is to be ascertained via statistical reasoning. In this context, the decision based on random samples may not always be correct, so appropriate strategies are needed to control the frequency of such errors. In this respect, the genesis of finite-sample principles of hypotheses testing stemmed primarily from the pioneering work of J. Neyman and E. S. Pearson in the 1930s. The Neyman–Pearsonian foundation for parametric as well as nonparametric setups in conjunction with other tributaries are appraised here under a finite-sample (exact) methodological framework, along with its transit to asymptotic reasoning.
Section 3.2 deals primarily with the basic concepts and the formulation of simple hypotheses testing problems. The more likely situation of composite hypotheses testing is considered with more detail in Section 3.3. There, diverse statistical approaches yielding different testing procedures are considered. In particular, invariant tests are highlighted. The interplay of invariance and sufficiency in parametric as well as nonparametric setups is analyzed in Section 3.4. Bayes procedures are to be discussed in Chapter 4.
Unbiasedness, efficiency, sufficiency, and ancillarity, as outlined in Chapters 2 and 3, are essentially finite-sample concepts, but consistency refers to indefinitely increasing samples sizes, and thus has an asymptotic nature. In general, finite-sample optimality properties of estimators and tests hold basically for a small class of probability laws, mostly related to the exponential family of distributions; consistency, however, holds under much less restricted setups as we will see. Moreover, even when finite-sample optimal statistical procedures exist, they may not lead to closed-form expressions and/or be subject to computational burden. These problems are not as bothersome when we adopt an asymptotic point of view and use the corresponding results to obtain good approximations of such procedures for large (although finite) samples. This is accomplished with the incorporation of probability inequalities, limit theorems, and other tools that will be developed in this and subsequent chapters.
In this context, a minimal requirement for a good statistical decision rule is its increasing reliability with increasing sample sizes (consistency). For an estimator, consistency relates to an increasing closeness to its population counterpart as the sample sizes become larger. In view of its stochastic nature, this closeness needs to incorporate its fluctuation around the parameter it estimates and thus requires an appropriate adaptation of the definitions usually considered in nonstochastic setups. Generally, a distance function or norm of this stochastic fluctuation is incorporated in the formulation of this closeness, and consistency refers to the convergence of this norm to 0 in some well-defined manner.
We consider a set of independent and identically distributed (i.i.d.) random variables X1, …, Xn following a probability law Pθ, the form of which is known up to some associated parameter θ, appearing as a constant. Both the Xi and θ can be vector-valued. The set X of possible values of the sample X1, …, Xn is called the sample space. We assume that θ ∈ Θ, the parameter space, so that a parametric family of probability laws may be represented as Pθ = {Pθ; θ ∈ Θ}. One of the objectives of statistical estimation theory is to develop methods of choosing appropriate statistics Tn = T(X1, …, Xn), that is, functions of the sample observations, to estimate θ (i.e., to guess the true value of θ) in a reproducible way. In this context, Tn is an estimator of θ.
In an alternative (nonparametric) setup, we may allow the probability law P to be a member of a general class P, not necessarily indexed by some parameter, and then our interest lies in estimating the probability law P itself or some functional θ(P) thereof, without specifying the form of P.
There may be many variations of this setup wherein the Xi may not be independent nor identically distributed as in multisample models, linear models, time-series models, stochastic processes, or unequal probability sampling models.
Statistics is a body of mathematically based methodology designed to organize, describe, model, and analyze data. In this context, statistical inference relates to the process of drawing conclusions about the unknown frequency distribution (or some summary measure therefrom) of some characteristic of a population based on a known subset thereof (the sample data, or, for short, the sample). Drawing statistical conclusions involves the choice of suitable models that allow for random errors, and this, in turn, calls for convenient probability laws. It also involves the ascertainment of how appropriate a postulated probability model is for the genesis of a given dataset, and of how adequate the sample size is to maintain incorrect conclusions within acceptable limits.
Finite statistical inference tools, in use for the last decades, are appealing because, in general, they provide “exact” statistical results. As such, finite methodology has experienced continuous upgrading with annexation of novel concepts and approaches. Bayesian methods are especially noteworthy in this respect. Nevertheless, it has been thoroughly assessed that the scope of exact statistical inference in an optimal or, at least, desirable way, is rather confined to some special classes of probability laws (such as the exponential family of densities). In real-life applications, such optimal statistical inference tools often stumble into impasses, ranging from validity to efficacy and thus, have practical drawbacks. This is particularly the case with large datasets, which are encountered in diverse (and often interdisciplinary) studies, more so now than in the past.
In Chapters 6 and 7 we developed tools to study the stochastic convergence and the asymptotic distributions of a general class of statistics. These tools can be directly incorporated in the study of asymptotic properties of a variety of (point as well as interval) estimators and tests. However, many of the methods discussed in Section 2.4 rest on suitable estimating equations with solutions that, in general, do not lead to estimators that can be expressed as explicit functions of the sample observations. Even when such closed-form expressions are available, they are not exactly in the form of the statistics treated in Chapters 6 and 7. Similarly, for composite hypotheses, specially when the underlying distributions do not belong to the exponential family, test statistics may suffer from the same drawbacks. The classical maximum likelihood estimators and likelihood ratio tests generally have this undesirable feature, albeit they may have well defined asymptotic optimality properties. In this chapter, we consider a general method to obtain asymptotic properties for a very broad class of statistics that fall in this category. Basically, we provide a viable link to the topics considered in the preceding two chapters borrowing strength from the methodology outlined there.
In Section 8.2, using a uniform asymptotic linearity property we obtain the asymptotic properties of estimators generated by estimating equations like (2.4.21). Although the MLE is a special member of this family of estimators, we provide related specialized details in Section 8.3, because of its important role in statistical inference.