Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-08T22:06:45.410Z Has data issue: false hasContentIssue false

Association between covariates and disease occurrence in the presence of diagnostic error

Published online by Cambridge University Press:  23 September 2011

F. LEWIS*
Affiliation:
Vetsuisse Faculty, University of Zürich, Zürich, Switzerland
M. J. SANCHEZ-VAZQUEZ
Affiliation:
Epidemiology Research Unit, SAC (Scottish Agricultural College), King's Buildings, West Mains Road, Edinburgh, UK
P. R. TORGERSON
Affiliation:
Vetsuisse Faculty, University of Zürich, Zürich, Switzerland
*
*Author for correspondence: Dr F. Lewis, Vetsuisse Faculty, University of Zürich, Winterthurerstrasse 270, Zürich, Switzerland, CH 8057. (Email: fraseriain.lewis@uzh.ch)
Rights & Permissions [Opens in a new window]

Summary

Identification of covariates associated with disease is a key part of epidemiological research. Yet, while adjustment for imperfect diagnostic accuracy is well established when estimating disease prevalence, similar adjustment when estimating covariate effects is far less common, although of important practical relevance due to the sensitivity of such analyses to misclassification error. Case-study data exploring evidence for seasonal differences in Salmonella prevalence using serological testing is presented, in addition simulated data with known properties are analysed. It is demonstrated that: (i) adjusting for misclassification error in models comprising continuous covariates can have a very substantial impact on the resulting conclusions which can then be drawn from any analyses; and (ii) incorporating prior knowledge through Bayesian estimation can provide potentially more informative assessments of covariates while removing the assumption of perfect diagnostic accuracy. The method presented is widely applicable and easily generalized to many types of epidemiological studies.

Information

Type
Original Papers
Copyright
Copyright © Cambridge University Press 2011
Figure 0

Fig. 1. Analysis of simulated data. (a, b) Raw data with corresponding fitted trend lines assuming the diagnostic test used was a gold-standard test [apparent prevalence (dashed line), true prevalence (solid line)] as a function of the covariate. As expected there is considerable difference between apparent and true prevalence. (a) n=100 per covariate pattern, test positive (T+) and true prevalence (D+); (b) n=2000 per covariate pattern. (c, d) Estimates for the slope parameters in panels (a) and (b). 95% confidence intervals (defined as where the horizontal lines cross the profile likelihood) show there is great uncertainty in the slope when the test is not a gold standard and this is still considerable even for the much larger sample size. (c) n=100 per covariate pattern; (d) n=2000 per covariate pattern.

Figure 1

Fig. 2. Profile likelihood surface for Se and Sp estimated from Salmonella data. There is great uncertainty in the estimate of Se but relatively less in Sp. The MLE is (Se, Sp)=(0·99, 0·907) with the critical value defining a 95% confidence set within this surface at −2904·61 (outside the limits shown).

Figure 2

Table 1. Analysis of simulated data [95% confidence intervals for slope parameter (β1) by sample size and model type]

Figure 3

Fig. 3. (a) Analysis of Salmonella data. There are great differences between true and apparent prevalence. Total sample size of 8028 pigs, apparent prevalence (T+), latent true prevalence (D+). (b) After accounting for the impact of the imperfect test it is not possible to draw any conclusions as to seasonal changes in true prevalence. Range of trajectories corresponds to the joint 95% confidence set for Se and Sp.

Figure 4

Fig. 4. Bayesian estimation of Salmonella data using extremely strong priors of β(99, 1), close to perfect accuracy, for Se and Sp. There still exists a very large amount of uncertainty in estimates of true prevalence over time and much more than assuming a gold-standard test. (a) Prior and posterior densities for Se. (b) Prior and posterior densities for Sp. (c) Range of trajectories corresponding to the top 95% of log-likelihood values sampled during Markov chain estimation of the Bayesian latent variable model. The trajectory estimate with highest posterior log-likelihood is also shown.

Supplementary material: PDF

Lewis Supplementary Material

Lewis Supplementary Material

Download Lewis Supplementary Material(PDF)
PDF 849.8 KB