Hostname: page-component-6766d58669-bkrcr Total loading time: 0 Render date: 2026-05-22T03:14:24.249Z Has data issue: false hasContentIssue false

How to improve infectious disease prediction by integrating environmental data: an application of a novel ensemble analysis strategy to predict HFMD

Published online by Cambridge University Press:  15 January 2021

Junwen Tao
Affiliation:
West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
Yue Ma
Affiliation:
West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
Xuefei Zhuang
Affiliation:
West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
Qiang Lv
Affiliation:
Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, People's Republic of China
Yaqiong Liu
Affiliation:
Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, People's Republic of China
Tao Zhang*
Affiliation:
West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
Fei Yin*
Affiliation:
West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
*
Author for correspondence: Tao Zhang, E-mail: taozscu@163.com; Fei Yin, E-mail: scupublichealth@163.com
Author for correspondence: Tao Zhang, E-mail: taozscu@163.com; Fei Yin, E-mail: scupublichealth@163.com
Rights & Permissions [Opens in a new window]

Abstract

This study proposed a novel ensemble analysis strategy to improve hand, foot and mouth disease (HFMD) prediction by integrating environmental data. The approach began by establishing a vector autoregressive model (VAR). Then, a dynamic Bayesian networks (DBN) model was used for variable selection of environmental factors. Finally, a VAR model with constraints (CVAR) was established for predicting the incidence of HFMD in Chengdu city from 2011 to 2017. DBN showed that temperature was related to HFMD at lags 1 and 2. Humidity, wind speed, sunshine, PM10, SO2 and NO2 were related to HFMD at lag 2. Compared with the autoregressive integrated moving average model with external variables (ARIMAX), the CVAR model had a higher coefficient of determination (R2, average difference: + 2.11%; t = 6.2051, P = 0.0003 < 0.05), a lower root mean-squared error (−24.88%; t = −5.2898, P = 0.0007 < 0.05) and a lower mean absolute percentage error (−16.69%; t = −4.3647, P = 0.0024 < 0.05). The accuracy of predicting the time-series shape was 88.16% for the CVAR model and 86.41% for ARIMAX. The CVAR model performed better in terms of variable selection, model interpretation and prediction. Therefore, it could be used by health authorities to identify potential HFMD outbreaks and develop disease control measures.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press
Figure 0

Table 1. Variable names, abbreviations and units in this study

Figure 1

Fig. 1. Process of the novel ensemble analysis strategy.

Figure 2

Fig. 2. Time-series plots of variables in this study.

Figure 3

Table 2. Descriptions of daily HFMD incidence, meteorological and air pollution variables in Chengdu from 2011 to 2017

Figure 4

Fig. 3. Summarised DBN graph of the DBN_①−⑨ models.

Figure 5

Table 3. The sensitivity analysis of the CVAR_①−⑨ models

Figure 6

Fig. 4. Summarised impulse response analysis of the CVAR_①−⑨ models.

Figure 7

Table 4. Results of the ARIMAX_①−⑨ models

Figure 8

Fig. 5. Incidence of HFMD predicted by the CVAR_①−⑨ and ARIMAX_①−⑨ models in the test set.

Figure 9

Fig. 6. Incidence of HFMD fitted by the CVAR_①−⑨ and ARIMAX_①−⑨ models in the training set.

Figure 10

Table 5. Comparisons of R2, RMSE, MAPE, ranges and means between the CVAR_①−⑨ and ARIMAX_①−⑨ models for 1-day ahead dynamic prediction

Figure 11

Table 6. The averaged confusion matrix of the CVAR models

Figure 12

Table 7. The averaged confusion matrix of the ARIMAX models

Supplementary material: PDF

Tao et al. supplementary material

Tao et al. supplementary material

Download Tao et al. supplementary material(PDF)
PDF 1.2 MB