Hostname: page-component-5db58dd55d-bthnr Total loading time: 0 Render date: 2026-06-02T00:00:21.643Z Has data issue: false hasContentIssue false

How to use replicate weights in health survey analysis using the National Nutrition and Physical Activity Survey as an example

Published online by Cambridge University Press:  19 August 2019

Carole L Birrell*
Affiliation:
Centre for Statistical and Survey Methodology, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, University of Wollongong, Northfields Avenue, Wollongong, NSW 2522, Australia
David G Steel
Affiliation:
Centre for Statistical and Survey Methodology, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, University of Wollongong, Northfields Avenue, Wollongong, NSW 2522, Australia
Marijka J Batterham
Affiliation:
Centre for Statistical and Survey Methodology, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, University of Wollongong, Northfields Avenue, Wollongong, NSW 2522, Australia
Ankur Arya
Affiliation:
Centre for Statistical and Survey Methodology, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, University of Wollongong, Northfields Avenue, Wollongong, NSW 2522, Australia
*
*Corresponding author: Email cbirrell@uow.edu.au
Rights & Permissions [Opens in a new window]

Abstract

Objective:

To conduct nutrition-related analyses on large-scale health surveys, two aspects of the survey must be incorporated into the analysis: the sampling weights and the sample design; a practice which is not always observed. The present paper compares three analyses: (1) unweighted; (2) weighted but not accounting for the complex sample design; and (3) weighted and accounting for the complex design using replicate weights.

Design:

Descriptive statistics are computed and a logistic regression investigation of being overweight/obese is conducted using Stata.

Setting:

Cross-sectional health survey with complex sample design where replicate weights are supplied rather than the variables containing sample design information.

Participants:

Responding adults from the National Nutrition and Physical Activity Survey (NNPAS) part of the Australian Health Survey (2011–2013).

Results:

Unweighted analysis produces biased estimates and incorrect estimates of se. Adjusting for the sampling weights gives unbiased estimates but incorrect se estimates. Incorporating both the sampling weights and the sample design results in unbiased estimates and the correct se estimates. This can affect interpretation; for example, the incorrect estimate of the OR for being a current smoker in the unweighted analysis was 1·20 (95 % CI 1·06, 1·37), t= 2·89, P = 0·004, suggesting a statistically significant relationship with being overweight/obese. When the sampling weights and complex sample design are correctly incorporated, the results are no longer statistically significant: OR = 1·06 (95 % CI 0·89, 1·27), t = 0·71, P = 0·480.

Conclusions:

Correct incorporation of the sampling weights and sample design is crucial for valid inference from survey data.

Information

Type
Research paper
Copyright
© The Authors 2019 
Figure 0

Table 1 Summary statistics for the sampling (person) weight variable (NPAFINWT) and the non-zero values of the first replicate weight variable (WPM0101)

Figure 1

Fig. 1 Histogram of the person (sampling) weight variable NPAFINWT, n 12 153

Figure 2

Fig. 2 Histogram of the first replicate weight variable WPM0101, n 12 153

Figure 3

Table 2 Results for estimates of mean Height (in cm), Weight (in kg), BMI, percentage of Overweight or Obese adults (BMI ≥ 25 kg/m2) and percentage of Current Smokers for all adults, males (M) and females (F), and associated se, shown for three methods: (1) unweighted; (2) weighted; and (3) complex design using a jack-knife procedure with the replicate weights

Figure 4

Table 3 Results for logistic regression (OR, se, t statistic, related P value and 95 % CI) for all adults (n 7874): whether or not an adult is Overweight or Obese given six explanatory variables is shown for three methods: (1) unweighted; (2) weighted; and (3) complex design using a jack-knife procedure with the replicate weights