Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-12T03:53:52.043Z Has data issue: false hasContentIssue false

Applications of machine learning techniques to predict filariasis using socio-economic factors

Published online by Cambridge University Press:  02 September 2019

Phani Krishna Kondeti
Affiliation:
Bioinformatics Group, Department of Applied Biology, CSIR-Indian Institute of Chemical Technology, Hyderabad-500 007, Andhra Pradesh, India
Kumar Ravi
Affiliation:
Centre for Excellence in Analytics, Institute for Development and Research in Banking Technology, Hyderabad-500 057, Telangana, India
Srinivasa Rao Mutheneni*
Affiliation:
Bioinformatics Group, Department of Applied Biology, CSIR-Indian Institute of Chemical Technology, Hyderabad-500 007, Andhra Pradesh, India
Madhusudhan Rao Kadiri
Affiliation:
Bioinformatics Group, Department of Applied Biology, CSIR-Indian Institute of Chemical Technology, Hyderabad-500 007, Andhra Pradesh, India
Sriram Kumaraswamy
Affiliation:
Bioinformatics Group, Department of Applied Biology, CSIR-Indian Institute of Chemical Technology, Hyderabad-500 007, Andhra Pradesh, India
Ravi Vadlamani
Affiliation:
Centre for Excellence in Analytics, Institute for Development and Research in Banking Technology, Hyderabad-500 057, Telangana, India
Suryanaryana Murty Upadhyayula
Affiliation:
National Institute of Pharmaceutical Education and Research, Guwahati-781 032, Assam, India
*
Author for correspondence: Srinivasa Rao Mutheneni, E-mail: msrinivas@iict.res.in
Rights & Permissions [Opens in a new window]

Abstract

Filariasis is one of the major public health concerns in India. Approximately 600 million people spread across 250 districts of India are at risk of filariasis. To predict this disease, a pilot scale study was carried out in 30 villages of Karimnagar district of Telangana from 2004 to 2007 to collect epidemiological and socio-economic data. The collected data are analysed by employing various machine learning techniques such as Naïve Bayes (NB), logistic model tree, probabilistic neural network, J48 (C4.5), classification and regression tree, JRip and gradient boosting machine. The performances of these algorithms are reported using sensitivity, specificity, accuracy and area under ROC curve (AUC). Among all employed classification methods, NB yielded the best AUC of 64% and was equally statistically significant with the rest of the classifiers. Similarly, the J48 algorithm generated 23 decision rules that help in developing an early warning system to implement better prevention and control efforts in the management of filariasis.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s) 2019
Figure 0

Fig. 1. Schematic diagram of the proposed methodology.

Figure 1

Table 1. Epidemiological and socio-economic attributes for the prediction of filariasis

Figure 2

Table 2. The results obtained for imbalanced dataset without feature selection

Figure 3

Table 3. Results obtained using 100% oversampling and gain ratio feature selection

Figure 4

Table 4. Results obtained using 400% oversampling and gain ratio feature selection

Figure 5

Fig. 2. Tree generated using J48 algorithm.

Figure 6

Fig. 3. Decision rules obtained using CART.

Figure 7

Fig. 4. Rules generated by the J48 algorithm.

Figure 8

Fig. 5. ROC area under the curve for GBM.

Figure 9

Table 5. Test sample dataset

Figure 10

Table 6. Results obtained using a different combination of variables

Figure 11

Table 7. Relative features importance obtained using GBM

Supplementary material: File

Kondeti et al. supplementary material

Kondeti et al. supplementary material

Download Kondeti  et al. supplementary material(File)
File 16.8 KB