To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Two-part models describe situations in which the ordered choice is part of a two-stage decision process. In a typical situation, an individual decides whether or not to participate in an activity, then, if so, decides how much. The first decision is a binary choice. The intensity outcome can be of several types – what interests us here is an ordered choice. In the example below, an individual decides whether or not to be a smoker. The intensity outcome is how much they smoke. The sample selection model is one in which the participation “decision” relates to whether the data on the outcome variable will be observed, rather than whether the activity is undertaken. This chapter will describe several types of two-part and sample selection models.
Inflation models
Harris and Zhao (2007) analyzed a sample of 28,813 Australian individuals' responses to the question “How often do you now smoke cigarettes, pipes or other tobacco products?” (Data are from the Australian National Drug Strategy Household Survey, NDSHS (2001).) Responses were “zero, low, moderate, high,” coded 0, 1, 2, 3. The sample frequencies of the four responses were 0.75, 0.04, 0.14, and 0.07. The spike at zero shows a considerable excess of zeros compared to what might be expected in an ordered choice model. The authors reason that there are numerous explanations for a zero response: “genuine nonsmokers, recent quitters, infrequent smokers who are not currently smoking and potential smokers who might smoke when, say, the price falls.” It is also possible that the zero response includes some individuals who prefer to identify themselves as nonsmokers. The question is ambiguously worded, but arguably, the group of interest is the genuine nonsmokers.
This book began as a short note to propose the estimator in Section 8.3. In researching the recent developments in ordered choice modeling, we concluded that it would be useful to include some pedagogical material about uses and interpretation of the model at the most basic level. Our review of the literature revealed an impressive breadth and depth of applications of ordered choice modeling, but no single source that provided a comprehensive summary. There are several somewhat narrow surveys of the basic ordered probit/logit model, including Winship and Mare (1984), Becker and Kennedy (1992), Daykin and Moffatt (2002) and Boes and Winkelmann (2006a), and a book-length treatment, by Johnson and Albert (1999) that is focused on Bayesian estimation of the basic model parameters using grouped data. (See, also, Congdon (2005), Ch. 7 and Agresti (2002), Section 7.4.) However, these stop well short of examining the extensive range of variants of the model and the variety of fields of applications, such as bivariate and multivariate models, two-part models, duration models, panel data models, models with anchoring vignettes, semiparametric approaches, and so on. (We have, of necessity, omitted mention of many – perhaps most – of the huge number of applications.) This motivated us to assemble this more complete overview of the topic. As this review proceeded, it struck us that a more thorough survey of the model itself, including its historical development, might also be useful and (we hope) interesting for readers.
The random utility model described in Chapter 1 is one of two essential building blocks that form the foundation for modeling ordered choices. The second fundamental pillar is the model for binary choices. The ordered choice model that will be the focus of the rest of this book is an extension of a model used to analyze the situation of a choice between two alternatives – whether the individual takes an action or does not, or chooses one of two elemental alternatives, and so on. This chapter will develop the standard model for binary choices in considerable detail. Many of the results analyzed in the chapters to follow will then be straightforward extensions.
There are numerous surveys available, including Amemiya (1981), Greene (2008a, Ch. 23) and several book-length treatments such as Cox (1970) and Collett (1991). Our interest here is in the aspects of binary choice modeling that are likely to reappear in the analysis of ordered choices. We have therefore bypassed several topics that do appear in other treatments, notably semiparametric and nonparametric approaches, but whose counterparts have not yet made significant inroads in ordered choice modeling. (Chapter 12 does contain some description of a few early entrants to this nascent literature.) This chapter also contains a long list of topics related to binary choice modeling, such as fit measures, multiple equation models, sample selection, and many others, that are useful as components or building blocks in the analysis of ordered choices. Our intent with this chapter is to extend beyond conventional binary choice modeling, and provide a bridge to the somewhat more involved models for ordered choices.
The foregoing has surveyed nearly all of the literature on ordered choice modeling. We have, of course, listed only a small fraction of the received applications. However, the full range of methodological developments has been presented, with a single remaining exception. As in many other areas of econometrics, a thread of the contemporary literature has explored the boundaries of the model that are circumscribed by the distributional assumptions. We have limited ourselves to ordered logit and probit models, while relaxing certain assumptions such as homoscedasticity, all within the boundaries of the parametric model. The last strand of literature to be examined is the development of estimators that extend beyond the parametric distributional assumptions. It is useful to organize the overview around a few features of the model: scaling, the distribution of the disturbance, the functional form of the regression, and so on. In each of these cases, we can focus on applications that broaden the reach of the ordered choice model to less tightly specified settings.
There is a long, rich history of semiparametric and nonparametric analysis of binary choice modeling (far too long and rich to examine in depth in this already long survey) that begins in the 1970s, only a few years after analysis of individual binary data became a standard technique. The binary choice literature has two focal points, maximum score estimation (Manski (1975, 1985), Manski and Thompson (1985), and Horowitz (1992)) and the Klein and Spady (1993) kernel-based semiparametric estimator for binary choice. (As noted, there is a huge number of other papers on the subject. We are making no attempt to survey this literature.)
In this chapter, we will survey the elements of estimation, inference, and analysis with the ordered choice model. It will prove useful to develop an application as part of the discussion.
Application of the ordered choice model to self-assessed health status
Riphahn et al. (2003) analyzed individual data on health care utilization (doctor visits and hospital visits) using various models for counts. The data set is an unbalanced panel of 7,293 German households observed from one to seven times for a total of 27,326 observations, extracted from the German Socioeconomic Panel (GSOEP). (See Riphahn et al. (2003) and Greene (2008a) for discussion of the data set in detail.) Among the variables in this data set is HSAT, a self-reported health assessment that is recorded with values 0,1,…,10 (so, J = 10). Figure 5.1 shows the distribution of outcomes for the full sample: The figure reports the variable NewHSAT, not the original variable. Forty of the 27,326 observations on HSAT in the original data were coded with noninteger values between 6.5 and 6.95. We have changed these forty observations to sevens. In order to construct a compact example that is sufficiently general to illustrate the technique, we will aggregate the categories shown as follows: (0–2) = 0, (3–5) = 1, (6–8) = 2, (9) = 3, (10) = 4. (One might expect collapsing the data in this fashion to sacrifice some information and, in turn, produce a less efficient estimator of the model parameters. See Murad et al. (2003) for some analysis of this issue.)
McKelvey and Zavoina's proposal is preceded by several earlier developments in the statistical literature. The chronology to follow does suggest, however, that their development produced a discrete step in the received body of techniques. The obvious starting point was the early work on probit methods in toxicology, beginning with Bliss (1934a) and made famous by Finney's (1947b) classic monograph on the subject. The ordered choice model that we are interested in here appears in three clearly discernible steps in the literature: Aitchison and Silvey's (1957) treatment of stages in the life cycle of a certain insect, Snell's (1964) analysis of ordered outcomes (without a regression interpretation), and McKelvey and Zavoina's (1975) proposal of the modern form of the “ordered probit regression model.” Some later papers, e.g., Anderson (1984) expanded on the basic models. Walker and Duncan (1967) is another discrete step in the direction of analyzing individual data.
The origin of probit analysis: Bliss (1934a), Finney (1947a)
Bliss (1934a) tabulated graphically the results of a laboratory study of the effectiveness of an insecticide. He plotted the relationship between the “Percent of Aphids Killed” on the ordinate and “Milligrams of Nicotine Per 100 ML of Spray” on the abscissa of a simple figure, reproduced here as Figure 4.1. The figure loosely traces out the familiar sigmoid shape of the normal CDF, and in a natural fashion provides data on what kill rate can be expected for a given concentration of nicotine.
Modern statistical methods use complex, sophisticated models that can lead to intractable computations. Saddlepoint approximations can be the answer. Written from the user's point of view, this book explains in clear language how such approximate probability computations are made, taking readers from the very beginnings to current applications. The core material is presented in chapters 1-6 at an elementary mathematical level. Chapters 7-9 then give a highly readable account of higher-order asymptotic inference. Later chapters address areas where saddlepoint methods have had substantial impact: multivariate testing, stochastic systems and applied probability, bootstrap implementation in the transform domain, and Bayesian computation and inference. No previous background in the area is required. Data examples from real applications demonstrate the practical value of the methods. Ideal for graduate students and researchers in statistics, biostatistics, electrical engineering, econometrics, and applied mathematics, this is both an entry-level text and a valuable reference.
This is a straightforward and practical introduction to statistics for students without any advanced knowledge of mathematics who need to use statistical techniques. The author provides a wide selection of effective tools of the trade so that the reader can tackle a whole variety of concrete situations. No attempt is made to give a comprehensive account of the subject. Enough technique is explained so that the reader can appreciate when there is a need to turn to more advanced material elsewhere. The basic mathematics required is at the level of senior secondary school education, although even less is necessary for the earlier chapters of the book.
This book was first published in 2004. Many observed phenomena, from the changing health of a patient to values on the stock market, are characterised by quantities that vary over time: stochastic processes are designed to study them. This book introduces practical methods of applying stochastic processes to an audience knowledgeable only in basic statistics. It covers almost all aspects of the subject and presents the theory in an easily accessible form that is highlighted by application to many examples. These examples arise from dozens of areas, from sociology through medicine to engineering. Complementing these are exercise sets making the book suited for introductory courses in stochastic processes. Software (available from www.cambridge.org) is provided for the freely available R system for the reader to apply to all the models presented.
Although both philosophers and scientists are interested in how to obtain reliable knowledge in the face of error, there is a gap between their perspectives that has been an obstacle to progress. By means of a series of exchanges between the editors and leaders from the philosophy of science, statistics and economics, this volume offers a cumulative introduction connecting problems of traditional philosophy of science to problems of inference in statistical and empirical modelling practice. Philosophers of science and scientific practitioners are challenged to reevaluate the assumptions of their own theories - philosophical or methodological. Practitioners may better appreciate the foundational issues around which their questions revolve and thereby become better 'applied philosophers'. Conversely, new avenues emerge for finally solving recalcitrant philosophical problems of induction, explanation and theory testing.
Chapter Preview. This chapter introduces regression where the dependent variable is the time until an event, such as the time until death, the onset of a disease, or the default on a loan. Event times are often limited by sampling procedures and so ideas of censoring and truncation of data are summarized in this chapter. Event times are nonnegative and their distributions are described in terms of survival and hazard functions. Two types of hazard-based regression are considered, a fully parametric accelerated failure time model and a semiparametric proportional hazards models.
Introduction
In survival models, the dependent variable is the time until an event of interest. The classic example of an event is time until death (the complement of death being survival). Survival models are now widely applied in many scientific disciplines; other examples of events of interest include the onset of Alzheimer's disease (biomedical), time until bankruptcy (economics), and time until divorce (sociology).
Example: Time until Bankruptcy. Shumway (2001) examined the time to bankruptcy for 3,182 firms listed on Compustat Industrial File and the CRSP Daily Stock Return File for the New York Stock Exchange over the period 1962–92. Several explanatory financial variables were examined, including working capital to total assets, retained earnings to total assets, earnings before interest and taxes to total assets, market equity to total liabilities, sales to total assets, net income to total assets, total liabilities to total assets, and current assets to current liabilities. The dataset included 300 bankruptcies from 39,745 firm years.