To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A basic assumption in the construction of models from likelihood theory is that observations in the model are independent. This is a reasonable assumption for perhaps the majority of studies. However, for longitudinal studies this assumption is not feasible; nor does it hold when data are clustered. For example, observations from a study on student drop-out can be clustered by the type of schools sampled. If the study is related to intervention strategies, schools in affluent suburban, middle-class suburban, middle-class urban, and below poverty level schools have more highly correlated strategies within the school type than between types or groups. Likewise, if we have study data taken on a group of individual patients over time (e.g., treatment results obtained once per month for a year), the data related to individuals in the various time periods are likely to be more highly correlated than are treatment results between patients. Any time the data can be grouped into clusters, or panels, of correlated groups, we must adjust the likelihood-based model (based on independent observations) to account for the extra-correlation.
We have previously employed robust variance estimators and bootstrapped standard errors when faced with overdispersed count data. Overdispersed Poisson models were adjusted by using different types of negative binomial models, or by extending the basic Poisson model by adjusting the variance or by designing a new log-likelihood function to account for the specific cause of the overdispersion.
There are many times when certain data elements are lost, discarded, ignored, or are otherwise excluded from analysis. Truncated and censored models have been developed to deal with these types of data. Both models take two forms, truncation or censoring from below, and truncation or censoring from above. Count model forms take their basic logic from truncated and censored continuous response data, in particular from Tobit (Amemiya, 1984) and censored normal regression (Goldberger, 1983) respectively.
Count sample selection models also deal with data situations in which the distribution is confounded by an external condition. We shall address sample selection models at the end of the chapter.
The traditional parameterization used for truncated and censored count data can be called the econometric parameterization. This is the form of model discussed in standard econometric texts and is the form found in current econometric software implementations. I distinguish this from what I term a survival parameterization, the form of which is derived from standard survival models. This parameterization only relates to censored Poisson and censored negative binomial models. I shall first address the more traditional econometric parameterization. In addition, I shall not use subscripts for this chapter; they are understood as presented in the earlier chapters.
Censored and truncated models – econometric parameterization
Censored and truncated count models are related, with only a relatively minor algorithmic difference between the two. The essential difference relates to how response values beyond a user-defined cut point are handled.
The negative binomial is traditionally derived from a Poisson–gamma mixture model. However, the negative binomial may also be thought of as a member of the single parameter exponential family of distributions. This family of distributions admits a characterization known as generalized linear models (GLMs), which summarizes each member of the family. Most importantly, the characterization is applicable to the negative binomial. Such interpretation allows statisticians to apply to the negative binomial model the various goodness-of-fit tests and residual analyses that have been developed for GLMs.
Poisson regression is the standard method used to model count response data. However, the Poisson distribution assumes the equality of its mean and variance – a property that is rarely found in real data. Data that have greater variance than the mean are termed Poisson overdispersed, but are more commonly designated as simply overdispersed. Negative binomial regression is a standard method used to model overdispersed Poisson data.
When the negative binomial is used to model overdispersed Poisson count data, the distribution can be thought of as an extension to the Poisson model. Certainly, when the negative binomial is derived as a Poisson–gamma mixture, thinking of it in this way makes perfect sense. The original derivation of the negative binomial regression model stems from this manner of understanding it, and has continued to characterize the model to the present time.
As mentioned above, the negative binomial has recently been thought of as having an origin other than as a Poisson–gamma mixture.
I have indicated that extended negative binomial models are generally developed to solve either a distributional or variance problem arising in the base NB-2 model. Changes to the negative binomial variance function were considered in the last chapter. In this chapter, we address the difficulties that arise when there are either no possible zeros in the data, or when there are an excessive number.
Zero-truncated negative binomial
Often we are asked to model count data that structurally exclude zero counts. Hospital length of stay data are an excellent example of count data that cannot have a zero count. When a patient first enters the hospital, the count begins. Upon registration the length of stay is given as 1. There can be no 0 days – unless we are describing patients who do not enter the hospital, and this is a different model where there may be two generating processes. This type of model will be discussed later.
The Poisson and negative binomial distributions both include zeros. When data structurally exclude zero counts, then the underlying probability distribution must preclude this outcome to properly model the data. This is not to say that Poisson and negative binomial models are not commonly used to model such data, the point is that they should not. The Poisson and negative binomial probability functions, and their respective log-likelihood functions, need to be amended to exclude zeros, and at the same time provide for all probabilities in the distribution to sum to one.
Poisson regression is the standard or base count response regression model. We have seen in previous discussion that other count models deal with data that violate the assumptions carried by the Poisson model. Since the model does play such a central role in count response modeling, we begin with an examination of its derivation and structure, as well as how it can be parametermized to model rates. The concept of overdispersion is introduced in this chapter, together with two tests that have been used to assess its existence and strength.
Derivation of the Poisson model
A primary assumption is that of equidispersion, or the equality of the mean and variance functions. When the value of the variance exceeds that of the mean, we have what is termed overdispersion. Negative binomial regression is a standard way to deal with certain types of Poisson overdispersion; we shall find that there are a variety of negative binomial based models, each of which address the manner in which overdispersion has arisen in the data. However, to fully appreciate the negative binomial model and its variations, it is important to have a basic understanding of the derivation of the Poisson as well as an understanding of the logic of its interpretation.
Maximum likelihood models, as well as the canonical form members of generalized linear models, are ultimately based on an estimating equation derived from a probability distribution.
Chapter 1 introduced expressions which define the various saddlepoint approximations along with enough supplementary information to allow the reader to begin making computations. This chapter develops some elementary properties of the approximations which leads to further understanding of the methods. Heuristic derivations for many of the approximations are presented.
Simple properties of the approximations
Some important properties possessed by saddlepoint density/mass functions and CDFs are developed below. Unless noted otherwise, the distributions involved throughout are assumed to have MGFs that are convergent on open neighborhoods of 0.
The first few properties concern a linear transformation of the random variable X to Y = σX + μ with σ ≠ 0. When X is discrete with integer support, then Y has support on a subset of the σ-lattice {μ,μ ± σ, μ ± 2σ, …}. The resulting variable Y has a saddlepoint mass and CDF approximation that has not been defined and there are a couple of ways in which to proceed. The more intriguing approach would be based on the inversion theory of the probability masses, however, the difficulty of this approach places it beyond the scope of this text. A more expedient and simpler alternative approach is taken here which adopts the following convention and which leads to the same approximations.
Lattice convention. The saddlepoint mass function and CDF approximation for lattice variable Y, with support in {μ, μ ± σ,μ ± 2σ, …} for σ > 0, are specified in terms of their equivalents based on X = (Y − μ) /σ with support on the integer lattice.
Approximations to continuous univariate CDFs of MLEs in curved exponential and transformation families have been derived in Barndorff-Nielsen (1986, 1990, 1991) and are often referred to as r * approximations. These approximations, along with their equivalent approximations of the Lugannani and Rice/Skovgaard form, are presented in the next two chapters. Section 8.2 considers the conditional CDF for the MLE of a scalar parameter given appropriate ancillaries. The more complex situation that encompasses a vector nuisance parameter is the subject of chapter 9.
Other approaches to this distribution theory, aimed more toward p-value computation, are also presented in section 8.5. Fraser and Reid (1993, 1995, 2001) and Fraser et al. (1999a) have suggested an approach based on geometrical considerations of the inference problem. In this approach, explicit ancillary expressions are not needed which helps to simplify the computational effort. Along these same lines, Skovgaard (1996) also offers methods forCDF approximation that are quite simple computationally. Specification of ancillaries is again not necessary and these methods are direct approximations to the procedures suggested by Barndorff-Nielsen above.
Expressions for these approximate CDFs involve partial derivatives of the likelihood with respect the parameter but also with respect to the MLE and other quantities holding the approximate ancillary fixed. The latter partial derivatives are called sample space derivatives and can be difficult to compute. An introduction to these derivatives is given in the next section and approximations to such derivatives, as suggested in Skovgaard (1996), are presented in appropriate sections.
The ratio R = U/V of two random variables U and V, perhaps dependent, admits to saddlepoint approximation through the joint MGF of (U, V). If V > 0 with probability one, then the Lugannani and Rice approximation may be easily applied to approximate the associated CDF. Saddlepoint density approximation based on the joint MGF uses the Geary (1944) representation for its density. This approach was first noted in Daniels (1954, –9) and is discussed in section 12.1 below.
The ratio R is the root of the estimating equation U − RV = 0 and the distribution theory for ratios can be generalized to consider distributions for roots of general estimating equations. The results of section 12.1 are subsumed into the more general discussion of section 12.2 that provides approximate distributions for roots of general estimating equations. Saddlepoint approximations for these roots began in the robustness literature where M-estimates are the roots of certain estimating equations and the interestwas in determining their distributions when sample sizes are small. Hampel (1973), Field and Hampel (1982), and Field (1982) were instrumental in developing this general approach.
Saddlepoint approximation for a vector of ratios, such as for example (R1, R2, R3) = {U1/V, U2/V, U3/V}, is presented in section 12.3 and generalizes the results of Geary (1944). An important class of such examples to be considered includes vector ratios of quadratic forms in normal variables. A particularly prominent example in times series which is treated in detail concerns approximation to the joint distribution for the sequence of lag correlations comprising the serial autocorrelation function.
In engineering reliability and multistate survival analysis, the machine or patient is viewed as a stochastic system or process which passes from one state to another over time. In many practical settings, the state space of this system is finite and the dynamics of the process are modelled as either a Markov process or alternatively as a semi-Markov process if aspects of the Markov assumption are unreasonable or too restrictive.
This chapter gives the CGFs connected with first passage times in general semi-Markov models with a finite number of states as developed in Butler (1997, 2000, 2001). Different formulae apply to different types of passage times, however all of these CGF formulae have one common feature: they are all represented as explicit matrix expressions that are ratios of matrix determinants. When inverting these CGFs using saddlepoint methods, the required first and second derivatives are also explicit so that the whole saddlepoint inversion becomes a simple explicit computation. These ingredients when used with the parametric plot routine in Maple, lead to explicit plots for the first passage density or CDF that completely avoid the burden of solving the saddlepoint equation.
Distributions for first passage or return times to a single absorbing state are considered in section 13.2 along with many examples derived from various stochastic models that arise in engineering reliability and queueing theory. Distributions for first passage to a subset of states require a different CGF formula which is developed in section 13.3. Passage times for birth and death processes can be approached from a slightly different perspective due to the movement restriction to neighboring states.
Up to now, all of the saddlepoint formulas have involved the univariate normal density function ø(z) and its CDF Φ(z). These expressions are called normal-based saddlepoint approximations and, for the most part, they serve the majority of needs in a wide range of applications. In some specialized settings however, greater accuracy may be achieved by using saddlepoint approximations developed around the idea of using a different distributional base than the standard normal.
This chapter presents saddlepoint approximations that are based on distributions other than the standard normal distribution. Suppose this base distribution has density function λ(z) and CDF Λ(z) and define the saddlepoint approximations that use this distribution as (λ, Λ)-based saddlepoint approximations. Derivations and properties of such approximations are presented in section 16.1 along with some simple examples. Most of the development below is based on Wood et al. (1993).
Most prominent among the base distributions is the inverse Gaussian distribution. The importance of this base is that it provides very accurate probability approximations for certain heavy-tailed distributions in settings for which the usual normal-based saddlepoint approximations are not accurate. These distributions include various first passage times in random walks and queues in which the system is either unstable, so the first passage distribution may be defective, or stable and close to the border of stability. Examples include the first return distribution to state 0 in a random walk that is null persistent or close to being so in both discrete and continuous time. A second example considers a predictive Bayesian analysis for a Markov queue with infinite buffer capacity.