To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Research on nonparametric estimation under shape constraints started in the 1950s. Papers such as Ayer et al., 1955, and Van Eeden, 1956, appeared on estimation of functions under the restriction of monotonicity or unimodality, more generally called isotonic estimation. An isotonic estimator is an estimator that is computed under an order restriction, where the order can be a partial order. The order restriction can also be imposed on the derivative of the estimator, so an estimator of a convex function (in dimension one or higher), which is itself also convex, is also called an isotonic estimator.
A summary of the early work was given in the well-known book by Barlow et al., 1972, on isotonic regression. Originally, the focus was on defining and constructing estimators satisfying these order constraints. As an example, in Grenander, 1956, it is shown that the (nonparametric) maximum likelihood estimator (MLE) of a monotone decreasing density can be constructed as the left-continuous slope of the least concave majorant of the empirical distribution function. Developing asymptotic distribution theory for these isotonic estimators turned out to be rather difficult. Nonnormal limit distributions appear and rates of convergence are slower than the square root of the sample size. This behavior is now commonly classified as belonging to the area of nonstandard asymptotics. In the case of the mentioned Grenander MLE, the rate of convergence of this estimator (evaluated at a fixed point, under some local assumptions) is the cube root of the sample size. Moreover, the nonnormal asymptotic distribution of the estimator is (after rescaling) the so-called Chernoff distribution, which is (up to a factor 2) the distribution of the derivative of the greatest convex minorant of two-sided Brownian motion with parabolic drift, evaluated at zero.
Research on isotonic regression received new impetus in the 1990s when it became clear that it was the right setting for studying (nonparametric) MLEs of the distribution function in inverse problems.
In Chapter 3, pointwise asymptotic results are derived for estimators in some of the basic models involving monotonicity as described in Chapter 2. In this chapter, further asymptotic pointwise results will be derived, now for estimators introduced in Chapter 4 and Chapter 8. The first, in Section 11.1, gives the asymptotic distribution of the least squares estimator of a convex decreasing density, as introduced in Section 4.3. This needs to be derived solely from the characterization of the estimator, since an explicit representation of the estimator is lacking. The approach is based on the asymptotic behavior of the characterization. Section 11.2 is concerned with an interesting and useful tail bound for the maximum likelihood estimator in the current status model introduced in Section 2.3.
In Section 11.3, a local variant of smooth functional methods is applied to derive the asymptotic pointwise distribution of the smoothed maximum likelihood estimator (SMLE) in the current status model as introduced in Section 8.1. The n1/3 rate of convergence for the plain MLE of the distribution function derived in Section 3.8 is replaced by the rate n2/5 for the SMLE. For the interval censoring case 2 model of Section 4.7, the SMLE and the maximum smoothed likelihood estimator (MSLE) are considered in Section 11.4 and Section 11.5. Under the separation of inspection times hypothesis, the rates of convergence of these estimators are shown to be n2/5, just as in the current status situation.
Finally, in Section 11.6, the problem of estimating a nondecreasing hazard rate under right censoring as introduced in Section 2.6 is considered. Also in this setting local smooth functional theory is applied to derive the asymptotic distribution on the SMLE.
The LS Estimator of a Convex Density
The least squares estimator of a convex decreasing density as introduced and studied in Section 4.3 cannot be expressed in terms of the empirical distribution as easily as, e.g., the maximum likelihood (or least squares) estimator of a decreasing density.
In Chapter 3, asymptotic results were derived for the basic problems with monotonicity restrictions. The consistency results were global (sometimes uniform) whereas only pointwise asymptotic distributions were derived. These results were related to convex minorants and explicit representations of estimators. In this chapter, asymptotic results will be derived for quantities that depend more globally on the underlying distribution. These results can be obtained using so-called smooth functional theory.
In Section 10.1, we discuss the asymptotic distribution of the maximum likelihood estimator of the expected value of the underlying random variable in the deconvolution model. Computing this estimator requires some computational effort. We compare the asymptotics of this estimator to that of a natural and easy-to-compute competitor, the sample mean minus the expectation of the noise variable. The approach is based on smooth functional theory, where functionals of the underlying distribution of interest are approximated by smooth functionals of the observation distribution.
The remaining sections are devoted to smooth functionals for the interval censoring model. The theory is still moderately straightforward for the interval censoring case 1 model (or current status model) to be considered in Section 10.2, where we have an explicit expression for the score functions. This is rather different for the interval censoring case 2 model, where one can only say that these score functions are solutions of certain integral equations and the whole theory has to be developed from properties of these solutions. Important properties will be studied in Section 10.3. In Section 10.4 the properties are used to derive the asymptotic distribution of the MLE for smooth functionals in the interval censoring case 2 model. We only treat the so-called separated case, where the intervals between the two observation times cannot become arbitrarily small. This separated case is treated in full detail, since it seems to give the prototype for what to do in the case that the score functions are not explicitly given.
To give a feeling of what this book is about, it is perhaps best to take a look at some real-life examples. Real-life examples have the disadvantage of giving rise to a lot of discussion on the interpretation of the data, as the authors have experienced when they started a lecture with a real-life example. This often distracted the audience from the main message of the lecture. But they have the advantage of “sticking in the mind,” which might be more important than the temporary distraction they might cause. Therefore, the first four sections of this chapter are about real data. Section 1.1 is concerned with the estimation of the expected duration of ice (in days) at Lake Mendota in Wisconsin, assuming these expected durations decrease in time. In Section 1.2, a data set on time-till-onset of a nonlethal lung tumor for mice is studied. There are two groups of mice, one living in a conventional environment and the other in a germ-free environment. The main question then is whether the distribution of the time-till-onset of the tumor is affected by the choice of environment. The complication is that the times of onset are not precisely observed, but subject to censoring. The third example, in Section 1.3, concerns the estimation of a relatively complicated quantity, the transmission potential of a disease, also based on censored data on hepatitis A in Bulgaria. Section 1.4 introduces the Bangkok Metropolitan Administration injecting drug users cohort study, which is further analyzed in Chapter 12, using methods that were developed for competing risk models.
In Section 1.5, a particular shape constrained estimation problem is considered. It is argued that this problem (and many of the other problems to be considered in this book) can also be viewed from another perspective; for example, as inverse problem, mixture model, or censoring problem. As will be seen later in this book, these points of view immediately suggest methods one could use for estimating shape constrained functions and methods one could use to compute these.
In Chapter 2, various models were introduced where monotonicity constraints are clearly involved. In this chapter, more problems will be described where monotonicity constraints play an important role. These constraints can be related to the interpretation of these problems as inverse problems, as also seen in Section 1.5. Some distribution function F (by definition monotone) in the background is to be estimated based on data from an induced distribution function G. Monotonicity of F induces more or less explicit shape constraints on the sampling distribution function G.
First a classical problem from stereology is considered: Wicksell's problem. This is concerned with estimating the distribution of radii of spheres randomly scattered in an opaque medium based on radii of circular profiles obtained by intersecting the medium with a plane. The second problem is that of estimating a concave regression function based on noisy data. Related to this, a simple model from ornithology is introduced. It concerns estimating the distribution of sojourn times of birds at an oasis based on observed times when specific birds were caught at the oasis. As in Wicksell's problem, imposing certain assumptions, the sampling distribution can be expressed in terms of the underlying distribution of interest. Also the estimation of log concave densities, star shaped distribution functions and distribution functions in deconvolution models more general than that of Section 2.4 and in the interval censoring case 2 will be considered.
For the problems discussed in this chapter, estimation procedures will be introduced, characterizations of these estimators will be given and some estimators will also be studied asymptotically. Estimation paradigms as plug-in inverse estimators, least squares estimators and maximum likelihood estimators will be illustrated and studied.
Wicksell's Corpuscle Problem
In the early 1920s, the Swedish mathematician Sven D. Wicksell at the University of Lund was confronted with an interesting problem from the medical sciences. Anatomist T. Helman tried to get hold of the distribution of the size of so-called follicles in human spleens. Postmortem examinations were executed, during which spleens were sliced at several places.
As seen in the examples discussed so far, shape-restricted estimators often satisfy the required shape constraint with minimal smoothness properties. The Grenander density estimator is decreasing, but discontinuous (see Figure 2.4). The least squares estimator for a convex decreasing density is convex and decreasing, but its derivative is discontinuous (see Figure 4.9). Similar observations can be made for other models. Sometimes, there are reasons to assume that an underlying distribution function is smooth. In other situations (as will be encountered in Chapter 9), smoothness of an estimated model is needed in a proof that a bootstrap method works.
In this chapter, the problem of estimating a smooth shape-constrained function is considered. The estimation of smooth functions without shape constraints has received quite some attention since the 1950s. Methods such as kernel smoothing and spline fitting have been widely applied and studied thoroughly. In order to obtain smooth shape-constrained estimators, various approaches are possible. A first is to smooth the nonsmooth shape-constrained estimator. In Section 8.1 this approach is illustrated using the maximum likelihood estimator (MLE) in the current status model. A related method interchanges the order of smoothing and maximizing in this procedure. In Section 8.2 it is first illustrated using the problem of estimating a decreasing density on [0, ∞) as introduced in Section 2.2. Instead of using the empirical distribution function in the definition of the log likelihood, a smooth estimator for the observation distribution function is used and then the corresponding smoothed (log) likelihood maximized to obtain an estimator. This method is also very natural if only binned observations are available. This will be seen in the context of Wicksell's problem as introduced in Section 4.1. Another method is to first estimate the distribution without using the shape constraint and process this estimator in such a way that it satisfies the shape constraint without losing its smoothness.
In this second edition of Counterfactuals and Causal Inference, completely revised and expanded, the essential features of the counterfactual approach to observational data analysis are presented with examples from the social, demographic, and health sciences. Alternative estimation techniques are first introduced using both the potential outcome model and causal graphs; after which, conditioning techniques, such as matching and regression, are presented from a potential outcomes perspective. For research scenarios in which important determinants of causal exposure are unobserved, alternative techniques, such as instrumental variable estimators, longitudinal methods, and estimation via causal mechanisms, are then presented. The importance of causal effect heterogeneity is stressed throughout the book, and the need for deep causal explanation via mechanisms is discussed.
In his 2009 book titled Causality: Models, Reasoning, and Inference, Judea Pearl lays out a powerful and extensive graphical theory of causality. Pearl's work provides a language and a framework for thinking about causality that differs from the potential outcome model presented in Chapter 2. Beyond the alternative terminology and notation, Pearl (2009, section 7.3) shows that the fundamental concepts underlying the potential outcome perspective and his causal graph perspective are equivalent, primarily because they both encode counterfactual causal states to define causality. Yet, each framework has value in elucidating different features of causal analysis, and we will explain these differences in this and subsequent chapters, aiming to convince the reader that these are complementary perspectives on the same fundamental issues.
Even though we have shown in the last chapter that the potential outcome model is simple and has great conceptual value, Pearl has shown that graphs nonetheless provide a direct and powerful way of thinking about full causal systems and the strategies that can be used to estimate the effects within them. Some of the advantage of the causal graph framework is precisely that it permits suppression of what could be a dizzying amount of notation to reference all patterns of potential outcomes for a system of causal relationships. In this sense, Pearl's perspective is a reaffirmation of the utility of graphical models in general, and its appeal to us is similar to the appeal of traditional path diagrams in an earlier era of social science research. Indeed, to readers familiar with path models, the directed graphs that we will present in this chapter will look familiar.