Hostname: page-component-77c78cf97d-kmjgn Total loading time: 0.001 Render date: 2026-04-29T18:06:23.680Z Has data issue: false hasContentIssue false

Estimating disease prevalence using census data

Published online by Cambridge University Press:  30 November 2007

M. CHOY
Affiliation:
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
P. SWITZER
Affiliation:
Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA, USA
C. De MARTEL
Affiliation:
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
J. PARSONNET*
Affiliation:
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
*
*Author for correspondence: J. Parsonnet, M.D., Stanford University, 300 Pasteur Dr., Grant Bldg, S-169, Stanford, CA 94305-5107, USA. (Email: parsonnt@stanford.edu)
Rights & Permissions [Opens in a new window]

Summary

We describe a method of working on publicly available data to estimate disease prevalence in small geographic areas using Helicobacter pylori as a model infection. Using data from the Third National Health and Nutrition Examination Survey, risk parameters for H. pylori infection were obtained by logistic regression and validated by predicting 737·5 infections in an independent cohort with 736 observed infections. The prevalence of H. pylori infection in the San Francisco Bay Area was estimated with the probabilities obtained from a predictive logistic model, using risk parameters with individual-level 1990 U.S. Census data as input. Predicted H. pylori prevalence was also compared to gastric cancer incidence obtained from the Northern California Cancer Center and showed a positive correlation with gastric cancer incidence (P<0·001, R2=0·87), and no statistically significant association with other malignancies. By exclusively using publicly available data, these methods may be applied to selected conditions with strong demographic predictors.

Information

Type
Original Papers
Copyright
Copyright © 2007 Cambridge University Press
Figure 0

Table 1. Demographics of study populations

Figure 1

Table 2. Odds ratios from logistic regression using NHANES III data with the outcome of H. pylori infection. (The model included two interaction terms: ethnicity×race and income×race.)

Figure 2

Fig. 1. ROC curve for validation dataset. Sensitivity and specificity refer to the prediction of H. pylori infection using demographic risk factors with serology as the reference standard. The black dot (●) represents the greatest balance between sensitivity and specificity. The area under the curve is 0·69.

Figure 3

Fig. 2. Map of study area. Percentages indicate the predicted prevalence of H. pylori infection in that county.

Figure 4

Fig. 3. Regression plots of H. pylori and age-adjusted gastric cancer rates. Each open symbol (○) represents a county, with the size of the symbol proportional to the population of the county. The regression was weighted by population and has the equation: gastric cancer rate=−11·76+63·172×H. pylori prevalence.

Figure 5

Table 3. Comparison of the predicted and observed numbers of infections using the NHANES III-derived infection parameter estimates and the validation dataset. Displayed are the predicted number of infections for each category and the observed number of infections with associated standard error (s.e.)