Hostname: page-component-77f85d65b8-g98kq Total loading time: 0 Render date: 2026-03-29T05:37:15.934Z Has data issue: false hasContentIssue false

A regression model for pooled data in a two-stage survey under informative sampling with application for detecting and estimating the presence of transgenic corn

Published online by Cambridge University Press:  23 March 2016

Osval A. Montesinos-López
Affiliation:
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), México, México
Kent Eskridge
Affiliation:
University of Nebraska, Statistics Department, Lincoln, Nebraska, USA
Abelardo Montesinos-López
Affiliation:
Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Guanajuato, México
José Crossa*
Affiliation:
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), México, México
Moises Cortés-Cruz
Affiliation:
Lab. de ADN y Genómicas, Centro Nacional de Recursos Genéticos, Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias (INIFAP)
Dong Wang
Affiliation:
Dow AgroSciences, Indianapolis, Indiana, USA
*
*Correspondence Email: j.crossa@cgiar.org
Rights & Permissions [Opens in a new window]

Abstract

Group-testing regression methods are effective for estimating and classifying binary responses and can substantially reduce the number of required diagnostic tests. However, there is no appropriate methodology when the sampling process is complex and informative. In these cases, researchers often ignore stratification and weights that can severely bias the estimates of the population parameters. In this paper, we develop group-testing regression models for analysing two-stage surveys with unequal selection probabilities and informative sampling. Weights are incorporated into the likelihood function using the pseudo-likelihood approach. A simulation study demonstrates that the proposed model reduces the bias in estimation considerably compared to other methods that ignore the weights. Finally, we apply the model for estimating the presence of transgenic corn in Mexico and we give the SAS code used for the analysis.

Information

Type
Research Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © Cambridge University Press 2016
Figure 0

Table 1. Comparison of informative sampling at both levels, at the cluster level and at the individual level, and non-informative sampling. Simulation means and standard deviations (Std) of point estimators of the intercept (β0 = −4.4631 true value) and the second-level standard deviation (σb = 0.9944 true value). Cluster sample m = 36 (12 from stratum 1 and 24 from stratum 2) under PPS. Elementary unit size nj = 100 (50 from stratum 1 and 50 from stratum 2) under SRS. Pool size (s). 600 simulations were performed for each scenario. Method 1: unweighted maximum likelihood; method 2: PML using raw weights at the cluster level; method 3: PML using raw weights at both levels; method 4: PML using raw weights at the cluster level and scaling Method A at the individual level; method 5: PML using raw weights at the cluster level and scaling method B; and method 6: PML using method D with weights at the cluster level

Figure 1

Table 2. Simulation means and standard deviations (Std) of the model with a covariate at the individual level (β0 = −4.7598, β1 = 0.8290 and σb = 0.9820 true values). Cluster sample m = 24 (8 from stratum 1 and 16 from stratum 2) under PPS. Elementary unit size nj = 100 (50 from stratum 1 and 50 from stratum 2) under SRS. Pool size (s). 600 simulations were performed for each scenario. Method 1: unweighted maximum likelihood; method 2: PML using raw weights at the cluster level, method 3: PML using raw weights at both levels; method 4: PML using raw weights at the cluster level and scaling method A at the individual level; method 5: PML using raw weights at the cluster level and scaling method B; and method 6 PML using method D with weights at the cluster level

Figure 2

Table 3. Comparison using three different elementary unit sizes (nj) selected under SRS. Simulation means and standard deviations (Std) of point estimators of the intercept (β0 = −4.4631 true value) and the second-level standard deviation (σb = 0.9944 true value). Cluster sample m = 24 (8 from stratum 1 and 16 from stratum 2) under PPS. Pool size (s). 600 simulations were performed for each scenario. Method 4: PML using raw weights at the cluster level and scaling method A at the individual level; method 5: PML using raw weights at the cluster level and scaling method B; and method 6: PML using method D with weights at the cluster level

Figure 3

Table 4. Population data including two regions, eight fields (clusters) and two strata per field (fertility levels, FL). The binary response of each plant is y, nihh* denotes total plants per combination of region, field and FL, Nih denotes total plants per field in each stratum and Nh denotes total plants per region

Figure 4

Table 5. Sample obtained by taking two fields within each region under PPS and doing stratified sampling within each field using a fixed sample size of six plants per stratum under SRS

Figure 5

Table 6. Sample prepared in terms of pools for analysis

Figure 6

Table 7. NLMIXED and GLIMMIX statements for two-stage group-testing regression under PPS