To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we provide a detailed discussion of empirical models for three examples based on four cross-sectional data sets. The first example analyzes the demand for medical care by the elderly in United States and shares many features of health utilization studies based on cross-section data. The second example is an analysis of recreational trips. The third is an analysis of completed fertility – the total number of children born to a woman with a complete history of births.
Figure 6.1 presents histograms for the four count variables studied; the first two histograms exclude the highest percentile for readability. Physician visits appear roughly negative binomial, with a mild excess of zeros. Recreational trips have a very large excess of zeroes. Completed fertility in both cases is bimodal, with modes at 0 and 2. Different count data models will most likely be needed for these different datasets.
The applications presented in this chapter emphasize fully parametric models for counts, an issue discussed in section 6.2. Sections 6.3 to 6.5 deal, in turn, with each of the three empirical applications. The health care example in section 6.3 is the most extensive example and provides a lengthy treatment of model fitting, selecting, and interpreting, with focus on a finite mixture model. The recreational trips example in section 6.4 pays particular attention to special treatment of zero counts versus positive counts. The completed fertility illustration in section 6.5 is a nonregression example that emphasizes fitting a distribution that is bimodal. Section 6.6 pursues amethodological question concerning the distribution of the LR test under nonstandard conditions, previously raised in Chapter 4.8.5.
Count regressions with endogenous regressors occur frequently. Ignoring the feedback from the response variable to the endogenous regressor, and simply conditioning the outcome on variables with which it is jointly determined, leads in general to inconsistent parameter estimates. The estimation procedure should instead allow for stochastic dependence between the response variable and endogenous regressors. In considering this issue the existing literature on simultaneous equation estimation in nonlinear models is of direct relevance (T. Amemiya, 1985).
The empirical example of Chapter 3 models doctor visits as depending in part on the individual's type of health insurance. In Chapter 3 the health insurance indicator variables were treated as exogenous, but health insurance is frequently a choice variable rather than exogenously assigned. A richer model is a simultaneous model with a count outcome depending on endogenous variable(s) that may be binary (two insurance plans), multinomial (more than two insurance plans), or simply continuous.
This chapter deals with several classes of models with endogenous regressors, tailored to the outcome of interest being a count. It discusses estimation and inference for both fully parametric full-information methods and less parametric limited-information methods. These approaches are based on a multiple equation model in which that for the count outcome is of central interest, but there is also an auxiliary model for the endogenous regressor, sometimes called the first-stage or reduced-form equation. Estimation methods differ according to the detail in which the reduced form is specified and exploited in estimation.
Longitudinal data or panel data are observations on a cross-section of individual units such as persons, households, firms, and regions that are observed over several time periods. The data structure is similar to that of multivariate data considered in Chapter 8. Analysis is simpler than for multivariate data because for each individual unit the same outcome variable is observed, rather than several different outcome variables. Yet analysis is also more complex because this same outcome variable is observed at different points in time, introducing times series data considerations presented in Chapter 7.
In this chapter we consider longitudinal data analysis if the dependent variable is a count variable. Remarkably, many count regression applications are to longitudinal data rather than simpler cross-section data. Econometrics examples include the number of patents awarded to each of many individual firms over several years, the number of accidents in each of several regions, and the number of days of absence for each of many persons over several years. A political science example is the number of protests in each of several different countries over many years. A biological and health science example is the number of occurrences of a specific health event, such as a seizure, for each of many patients in each of several time periods.
This chapter is intended to provide a self-contained treatment of basic crosssection count data regression analysis. It is analogous to a chapter in a standard statistics text that covers both homoskedastic and heteroskedastic linear regression models.
The most commonly used count models are Poisson and negative binomial. For readers interested only in these models, it is sufficient to read sections 3.1 to 3.5, along with preparatory material in sections 1.2 and 2.2.
As indicated in Chapter 2, the properties of an estimator vary with the assumptions made on the dgp. By correct specification of the conditional mean or variance or density, we mean that the functional form and explanatory variables in the specified conditional mean or variance or density are those of the dgp.
The simplest regression model for count data is the Poisson regression model. For the Poisson MLE, the following can be shown:
Consistency requires correct specification of the conditional mean. It does not require that the dependent variable y be Poisson distributed.
Valid statistical inference using default computed maximum likelihood standard errors and t statistics requires correct specification of both the conditional mean and variance. This requires equidispersion, that is, equality of conditional variance and mean, but not Poisson distribution for y.
Valid statistical inference using appropriately computed standard errors is still possible if data are not equidispersed, provided the conditional mean is correctly specified.
Bayesian methods provide a quite different way to view statistical inference and model selection and to incorporate prior information on model parameters. These methods have become increasingly popular over the past 20 years due to methodological advances, notably Markov chain Monte Carlo methods, and increased computational power.
Some applied studies in econometrics are fully Bayesian. Others merely use Bayesian methods as a tool to enable statistical inference in the classical frequentist maximum likelihood framework for likelihood-based models that are difficult to estimate using other methods such as simulated maximum likelihood.
Section 12.2 presents the basics of Bayesian analysis. Section 12.3 presents some results for Poisson models. Section 12.4 covers Markov chain Monte Carlo methods that are now the common way to implement Bayesian analysis when analytically tractable results cannot be obtained, and it provides an illustrative example. Section 12.5 summarizes Bayesian models for various types of count data. Section 12.6 concludes with a more complicated illustrative example, a count version of the Roy model that allows for endogenous selection.
BAYESIAN APPROACH
The Bayesian approach treats the parameters θ as unknown random variables, with inference on θ to be based both on the data y and on prior beliefs about θ. The data and prior beliefs are combined to form the posterior density of θ given y, and Bayesian inference is based on this posterior. This section presents a brief summary, with further details provided in subsequent sections.
This chapter presents the general modeling approaches most often used in count data analysis – likelihood-based, generalized linear models, and moment-based. Statistical inference for these nonlinear regression models is based on asymptotic theory, which is also summarized.
The models and results vary according to the strength of the distributional assumptions made. Likelihood-based models and the associated maximum likelihood estimator require complete specification of the distribution. Statistical inference is usually performed under the assumption that the distribution is correctly specified.
A less parametric analysis assumes that some aspects of the distribution of the dependent variable are correctly specified, whereas others are not specified or, if they are specified, are potentially misspecified. For count data models, considerable emphasis has been placed on analysis based on the assumption of correct specification of the conditional mean or of correct specification of both the conditional mean and the conditional variance. This is a nonlinear generalization of the linear regression model, in which consistency requires correct specification of the mean and efficient estimation requires correct specification of the mean and variance. It is a special case of the class of generalized linear models that is widely used in the statistics literature. Estimators for generalized linear models coincide with maximum likelihood estimators if the specified density is in the linear exponential family. But even then the analytical distribution of the same estimator can differ across the two approaches if different second moment assumptions are made.
The medium access control (MAC) layer provides, among other things, addressing and channel access control that makes it possible for multiple stations on a network to communicate. IEEE 802.11 is often referred to as wireless Ethernet and, in terms of addressing and channel access, 802.11 is indeed similar to Ethernet, which was standardized as IEEE 802.3. As a member of the IEEE 802 LAN family, IEEE 802.11 makes use of the IEEE 802 48-bit global address space, making it compatible with Ethernet at the link layer. The 802.11 MAC also supports shared access to the wireless medium through a technique called carrier sense multiple access with collision avoidance (CSMA/CA), which is similar to the original (shared medium) Ethernet’s carrier sense multiple access with collision detect (CSMA/CD). With both techniques, if the channel is sensed to be “idle,” the station is permitted to transmit, but if the channel is sensed to be “busy” then the station defers its transmission. However, the very different media over which Ethernet and 802.11 operate mean that there are some differences.
The Ethernet channel access protocol is essentially to wait for the medium to go “idle,” begin transmitting and, if a collision is detected while transmitting, to stop transmitting and begin a random backoff period. It is not feasible for a transmitter to detect a collision while transmitting in a wireless medium; thus the 802.11 channel access protocol attempts to avoid collisions. Once the medium goes “idle,” the station waits a random period during which it continues to sense the medium, and if at the end of that period the medium is still “idle,” it begins transmitting. The random period reduces the chances of a collision since another station waiting to access the medium would likely choose a different period, hence the collision avoidance aspect of CSMA/CA.
One of the functional requirements in the development of the 802.11n standard was that some modes of operation must be backward compatible with 802.11a (and 802.11g if 2.4 GHz was supported) as described in Stephens (2005). Furthermore, the 802.11n standard development group also decided that interoperability should occur at the physical layer. This led to the definition of a mandatory mixed format (MF) preamble in 802.11n. In this chapter, we first review the 802.11a packet structure, transmit procedures, and receive procedures to fully understand the issues in creating a preamble that is interoperable between 802.11a and 802.11n devices. For further details regarding 802.11a beyond this review, refer to clause 18 in IEEE (2012). Following this overview, the mixed format preamble, which is part of 802.11n, is discussed.
11a packet structure review
The 802.11a packet structure is illustrated in Figure 4.1.
The Short Training field (STF) is used for start-of-packet detection and automatic gain control (AGC) setting. In addition, the STF is also used for initial frequency offset estimation and initial time synchronization. This is followed by the Long Training field (LTF), which is used for channel estimation and for more accurate frequency offset estimation and time synchronization. Following the LTF is the Signal field (SIG), which contains the rate and length information for the packet. Example rates are BPSK, rate ½ encoding and 64-QAM, rate ¾ encoding. Following this is the Data field. The first 16 bits of the Data field contain the Service field. An example of an 802.11a transmit waveform is given in Figure 4.2.
With the addition of MIMO to IEEE 802.11, many new WLAN devices will have multiple antennas. Though an important benefit of multiple antennas is increased data rate with multiple spatial streams, multiple antennas may be also used to significantly improve the robustness of the system. Multiple antennas enable optional features such as receive diversity, spatial expansion, transmit beamforming, and space-time block coding (STBC). The topic of transmit beamforming is addressed in Chapter 13.
Advanced coding has also been added to 802.11n to further improve link robustness with the inclusion of the optional low density parity check (LDPC) codes and STBC. STBC combines multiple antennas with coding.
To simplify notation in the following sections, the system descriptions for receive diversity, STBC, and spatial expansion are given in the frequency domain for a single subcarrier. This is done since each technique is in fact applied to each subcarrier in the frequency band. It is assumed that, at the transmitter, the frequency domain data is transformed into a time domain waveform as described in Section 4.2. Furthermore, the receive procedure to generate frequency domain samples is described in Section 4.2.4.
To quantify the benefits of these features, this chapter contains simulation results modeling each technique. For each function, the simulation results include physical layer impairments, as described in Section 3.5. The equalizer is based on MMSE. Synchronization, channel estimation, and phase tracking are included in the simulation. For receive diversity, spatial expansion, and STBC, the simulations are performed using channel model B with NLOS conditions, as described in Section 3.5. For LDPC, the simulation is performed using channel model D with NLOS conditions.
This paper presents a modified passive dynamic walking model with hip friction. We add Coulomb friction to the hip joint of a two-dimensional straight-legged passive dynamic walker. The walking map is divided into two parts – the swing phase and the impact phase. Coulomb friction and impact make the model's dynamic equations nonlinear and non-smooth, and a numerical algorithm is given to deal with this model. We study the effects of hip friction on gait and obtain basins of attraction of different coefficients of friction.