Hostname: page-component-6766d58669-bkrcr Total loading time: 0 Render date: 2026-05-20T01:47:42.446Z Has data issue: false hasContentIssue false

Predicting incidence of hepatitis E using machine learning in Jiangsu Province, China

Published online by Cambridge University Press:  28 July 2022

Xiaoqing Cheng
Affiliation:
Jiangsu Provincial Centre for Disease Control and Prevention (Jiangsu Institution of Public health), Nanjing, Jiangsu, China Chinese Field Epidemiology Training Program, Chinese Center for Disease Control and Prevention, Beijing, China
Wendong Liu
Affiliation:
Jiangsu Provincial Centre for Disease Control and Prevention (Jiangsu Institution of Public health), Nanjing, Jiangsu, China
Xuefeng Zhang
Affiliation:
Jiangsu Provincial Centre for Disease Control and Prevention (Jiangsu Institution of Public health), Nanjing, Jiangsu, China
Minghao Wang*
Affiliation:
School of Computer Science and Engineering, Southeast University, Nanjing, China
Changjun Bao*
Affiliation:
Jiangsu Provincial Centre for Disease Control and Prevention (Jiangsu Institution of Public health), Nanjing, Jiangsu, China
Tianxing Wu*
Affiliation:
School of Computer Science and Engineering, Southeast University, Nanjing, China
*
Authors for correspondence: Tianxing Wu, E-mail: tianxingwu@seu.edu.cn; Minghao Wang, E-mail: wmh@seu.edu.cn; Changjun Bao, E-mail: bao2000_cn@163.com
Authors for correspondence: Tianxing Wu, E-mail: tianxingwu@seu.edu.cn; Minghao Wang, E-mail: wmh@seu.edu.cn; Changjun Bao, E-mail: bao2000_cn@163.com
Authors for correspondence: Tianxing Wu, E-mail: tianxingwu@seu.edu.cn; Minghao Wang, E-mail: wmh@seu.edu.cn; Changjun Bao, E-mail: bao2000_cn@163.com
Rights & Permissions [Opens in a new window]

Abstract

Hepatitis E is an increasingly serious worldwide public health problem that has attracted extensive attention. It is necessary to accurately predict the incidence of hepatitis E to better plan ahead for future medical care. In this study, we developed a Bi-LSTM model that incorporated meteorological factors to predict the prevalence of hepatitis E. The hepatitis E data used in this study are collected from January 2005 to March 2017 by Jiangsu Provincial Center for Disease Control and Prevention. ARIMA, GBDT, SVM, LSTM and Bi-LSTM models are adopted in this study. The data from January 2009 to September 2014 are used as the training set to fit models, and data from October 2014 to March 2017 are used as the testing set to evaluate the predicting accuracy of different models. Selecting models and evaluating the effectiveness of the models are based on mean absolute per cent error (MAPE), root mean square error (RMSE) and mean absolute error (MAE). A total of 44 923 cases of hepatitis E are detected in Jiangsu Province from January 2005 to March 2017. The average monthly incidence rate is 0.35 per 100 000 persons in Jiangsu Province. Incorporating meteorological factors of temperature, water vapour pressure, and rainfall as a combination into the Bi-LSTM Model achieved the state-of-the-art performance in predicting the monthly incidence of hepatitis E, in which RMSE is 0.044, MAPE is 11.88%, and MAE is 0.0377. The Bi-LSTM model with the meteorological factors of temperature, water vapour pressure, and rainfall can fully extract the linear and non-linear information in the hepatitis E incidence data, and has significantly improved the interpretability, learning ability, generalisability and prediction accuracy.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. The structure of our proposed model.

Figure 1

Fig. 2. The structure of a LSTM cell.

Figure 2

Fig. 3. The incidence of hepatitis E in Jiangsu Province from 01.2005 to 03.2017.

Figure 3

Table 1. Results of five models for monthly incidence of hepatitis E prediction

Figure 4

Fig. 4. Plot of observed monthly incidence of hepatitis E and predicted values via different models.

Figure 5

Table 2. Combinations of meteorological factors, ascending by RMSE

Figure 6

Table 3. Results of six models for monthly incidence of hepatitis E prediction

Figure 7

Fig. 5. Predictive Intervals of (a) ARIMA model (b) LSTM model (c) BiLSTM model (d) BiLSTM model with best meteorological factors.