Skip to main content Accesibility Help
×
×
Home

Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya

  • M. TREMBLAY (a1), J. S. DAHM (a1), C. N. WAMAE (a2) (a3), W. A. DE GLANVILLE (a4) (a5), E. M. FÈVRE (a5) (a6) and D. DÖPFER (a1)...
Summary

Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
* Author for correspondence: Dr M. Tremblay, Department of Medical Sciences, School of Veterinary Medicine, 2015 Linden Drive, Madison, WI 53706, USA. (Email: mtremblay@wisc.edu)
References
Hide All
1. Doble, L, Fèvre, EM. Focusing on neglected zoonoses. Veterinary Record 2010; 166: 546547.
2. Filmer, D, Pritchett, LH. Estimating wealth effects without expenditure data – or tears: an application to educational enrollments in states of India. Demography 2001; 38: 115132.
3. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2013.
4. WHO. Basic Laboratory Methods in Medical Parasitology. Geneva, Switzerland: World Health Organization, 1991.
5. Sterne, JAC, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. British Medical Journal 2009; 338: b2393.
6. van Buuren, S, Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. Journal of Statistical Software 2011; 45: 167.
7. van Buuren, S. Flexible Imputation of Missing Data. Boca Raton, FL: Chapman & Hall/CRC Press, 2012.
8. Okell, CN. An analysis of the dynamics of livestock and asset ownership with human health in a rural population in West Kenya (MSc Project Report) . London, United Kingdom: Royal Veterinary College, 2011, 7 pp.
9. Borcard, D, Gillet, F, Legendre, P. Numerical Ecology with R. New York: Springer, 2011, pp. 117.
10. Field, A. Discovering Statistics Using SPSS, 3rd edn. London: SAGE Publications Ltd, 2009, pp. 233.
11. Oksanen, J, et al. Vegan: community ecology. R package version 2.0-8 (http://CRAN.R-project.org/package=vegan), 2011.
12. Zou, H, Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005; 67: 301320.
13. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 2010; 33: 122.
14. Iwashita, H, et al. Push by a net, pull by a cow: can zooprophylaxis enhance the impact of insecticide treated bed nets on malaria control? Parasites & Vectors 2014; 7: 52.
15. Laurent, A, et al. Performance of HRP-2 based rapid diagnostic test for malaria and its variation with age in an area of intense malaria transmission in southern Tanzania. Malaria Journal 2010; 9: 294.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Epidemiology & Infection
  • ISSN: 0950-2688
  • EISSN: 1469-4409
  • URL: /core/journals/epidemiology-and-infection
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed