To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
• How do Poisson models differ from traditional linear regression models?
• What are the distributional assumptions of the Poisson regression model? For any count model?
• What is the dispersion statistic? How is it calculated?
• What is the relationship of Poisson standard errors to the dispersion tatistic?
• What is apparent overdisperson? How do we deal with it?
• How can a synthetic Monte Carlo Poisson model be developed?
• How are Poisson coefficients and rate-parameterized coefficients interpreted?
• What are marginal effects, partial effects, and discrete change with respect to count models?
Poisson regression is fundamental to the modeling of count data. It was the first model specifically used to model counts, and it still stands at the base of the many types of count models available to analysts. However, it was emphasized in the last chapter that because of the Poisson distributional assumption of equidispersion, using a Poisson model on real study data is usually unsatisfactory. It is sometimes possible to make adjustments to the Poisson model that remedy the problem of under- or over dispersion, but unfortunately often this is not possible. In this chapter, which is central to the book, we look at the nature of Poisson regression and provide guidelines on how to construct, interpret, and evaluate Poisson models as to their fit. The majority of fit tests we use for a Poisson model will be applicable to the more advanced count models discussed later.
In this final chapter, I briefly discuss models that have been developed for data that are generally more complex than what we have thus far observed. The sections are meant to introduce you to these methods if you have not already read about them – or used them.
The following list shows the types of data situations with which you may be confronted, together with a type of model that can be used in such a situation (given in parentheses).
Types of Data and Problems Dealt with in This Chapter
• Data sets are small and count data highly unbalanced. (exact Poisson)
• Counts have been truncated or censored at the left, right (highest values), or middle areas of the distribution. (truncated and censored Poisson and NB models)
• The count response appears to have two or more components, each being generated by a different mechanism. (finite mixture model)
• One or more model predictors are ill-shaped, needing smoothing. (GAM smoothers)
• Counts are erratically distributed and do not appear to follow a parametric count distribution very well. (quantile count models)
• Data are longitudinal or clustered in nature, where observations are not independent. (panel models; e.g., GEE, GLMM)
• Data are nested in levels or are in a hierarchical structure. (multilevel models)
A very brief overview of the Bayesian modeling of count data will be presented in Section 9.8.
• Why hasn't the PIG model been widely used before this?
• What types of data are best modeled using a PIG regression?
• How do we model data with a very high initial peak and long right skew?
• How do we know whether a PIG is a better-fitted model than negative binomial or Poisson models?
Poisson Inverse Gaussian Model Assumptions
The Poisson inverse Gaussian (PIG) model is similar to the negative binomial model in that both are mixture models. The negative binomial model is a mixture of Poisson and gamma distributions, whereas the inverse Gaussian model is a mixture of Poisson and inverse Gaussian distributions.
Those of you who are familiar with generalized linear models will notice that there are three GLM continuous distributions: normal, gamma, and inverse Gaussian. The normal distribution is typically parameterized to a lognormal distribution when associated with count models, presumably because the log link forces the distribution to have only nonnegative values. The Poisson and negative binomial (both NB2 and NB1) models have log links. Recall that the negative binomial is a mixture of the Poisson and gamma distributions, with variances of μ and μ2/v, respectively. We inverted v so that there is a direct relationship between the mean, dispersion, and variance function. Likewise, the inverse Gaussian is a mixture of Poisson and inverse Gaussian distributions, with an inverse Gaussian variance of μ3Φ.
Modeling Count Data is written for the practicing researcher who has a reason to analyze and draw sound conclusions from modeling count data. More specifically, it is written for an analyst who needs to construct a count response model but is not sure how to proceed.
A count response model is a statistical model for which the dependent, or response, variable is a count. A count is understood as a nonnegative discrete integer ranging from zero to some specified greater number. This book aims to be a clear and understandable guide to the following points:
• How to recognize the characteristics of count data
• Understanding the assumptions on which a count model is based
• Determining whether data violate these assumptions (e.g., overdispersion), why this is so, and what can be done about it
• Selecting the most appropriate model for the data to be analyzed
• Constructing a well-fitted model
• Interpreting model parameters and associated statistics
• Predicting counts, rate ratios, and probabilities based on a model
• Evaluating the goodness-of-fit for each model discussed
There is indeed a lot to consider when selecting the best-fitted model for your data. I will do my best in these pages to clarify the foremost concepts and problems unique to modeling counts. If you follow along carefully, you should have a good overview of the subject and a basic working knowledge needed for constructing an appropriate model for your study data.
• What are some of the foremost tests to determine whether a Poisson model is overdispersed?
• What is scaling? What does it do to a count model?
• Why should robust standard errors be used as a default?
• What is a quasi-likelihood model?
This chapter can be considered a continuation of Chapter 2. Few real-life Poisson data sets are truly equidispersed. Overdispersion to some degree is inherent in the vast majority of Poisson data. Thus, the real question deals with the amount of overdispersion in a particular model – is it statistically sufficient to require a model other than Poisson? This is one of the foremost questions we address in this chapter, together with how we assess fit and then adjust for the lack of it.
Basics of Count Model Fit Statistics
Most statisticians consider overdispersion the key problem when considering count model fit. That is, when thinking of the fit of a count model, an analyst typically attempts to evaluate whether a count model is extradispersed – which usually means overdispersed. If there is evidence of overdispersion in a Poisson model, the problem then is to determine what gives rise to it. If we can determine the cause, we can employ the appropriate model to use on the data.
Analysts have used a variety of tests to determine whether the model they used on their data actually fits.
• What is the relationship between a probability distribution function (PDF) and a statistical model?
• What are the parameters of a statistical model? Where do they come from, and can we ever truly know them?
• How does a count model differ from other regression models?
• What are the basic count models, and how do they relate with one another?
• What is over dispersion, and why is it considered to be the fundamental problem when modeling count data?
What Are Counts?
When discussing the modeling of count data, it's important to clarify exactly what is meant by a count, as well as “count data” and “count variable.” The word “count” is typically used as a verb meaning to enumerate units, items, or events. We might count the number of road kills observed on a stretch of highway, how many patients died at a particular hospital within 48 hours of having a myocardial infarction, or how many separate sunspots were observed in March 2013. “Count data,” on the other hand, is a plural noun referring to observations made about events or items that are enumerated. In statistics, count data refer to observations that have only nonnegative integer values ranging from zero to some greater undetermined value. Theoretically, counts can range from zero to infinity, but they are always limited to some lesser distinct value – generally the maximum value of the count data being modeled.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.