Forecasting the incidence of acute haemorrhagic conjunctivitis in Chongqing: a time series analysis

Acute haemorrhagic conjunctivitis is a highly contagious eye disease, the prediction of acute haemorrhagic conjunctivitis is very important to prevent and grasp its development trend. We use the exponential smoothing model and the seasonal autoregressive integrated moving average (SARIMA) model to analyse and predict. The monthly incidence data from 2004 to 2017 were used to fit two models, the actual incidence of acute haemorrhagic conjunctivitis in 2018 was used to validate the model. Finally, the prediction effect of exponential smoothing is best, the mean square error and the mean absolute percentage error were 0.0152 and 0.1871, respectively. In addition, the incidence of acute haemorrhagic conjunctivitis in Chongqing had a seasonal trend characteristic, with the peak period from June to September each year.


Introduction
Acute haemorrhagic conjunctivitis (AHC) is a highly infective eye disease, which is mainly caused by either enterovirus 70 (EV70) or coxsackievirus A24 (CVA24) infection [1,2]. The main symptoms are conjunctival congestion, burning feeling, photophobia, tears, feeling foreign body and so on. Outbreaks have been found in many countries, such as India, Senegal and so on [3][4][5]. Although cases occur every month, AHC incidence has obvious seasonal characteristics [6]. The symptoms appear within 2 days after direct contact with the source of infection [7], and the incidence of AHC is related to temperature and humidity [8].
In China, AHC is defined as a notifiable infectious disease [9]. It is a common eye infection disease in our country and has been reported in many cities, the incidence of AHC in Chongqing ranks the top 5 in China [10]. The epidemic circumstance still serious because AHC is highly contagious and there is no specific effective treatment. Modelling and forecasting of the incidence of AHC provide a basis for developing policies and interventions.
Time series analysis is a scientific quantitative prediction of the future trend of diseases based on historical data and time variables. It is a quantitative analysis method without considering the influence of complex factors [11] and widely used in various fields. The exponential smoothing method has the characteristics of simple calculation, no requirements for data distribution, and easy operation. It is a simple and feasible short-term time series prediction method [12]. ARIMA model is one of the most representative and widely applied models in the prediction of time series [13], this method is very simple and requires only endogenous variables instead of other exogenous variables.
Although there are some AHC studies based on epidemic disease [2,14,15], it is the first time to apply exponential smoothing method and SARIMA model for the incidence of AHC in Chongqing. This study used monthly incidence of AHC from 2004 to 2017 as a training set for fitting SARIMA model and Holt-Winters exponential smoothing model. Monthly data of 2018 as the verification set, comparing two models of the incidence of AHC to predict performance, to determine the optimal model to predict the tendency of the incidence of AHC and provide scientific basis for AHC prevention and control in Chongqing.

Study area and data collection
Chongqing is the fourth city firsthand under the jurisdiction of the Chinese central govern-ment， it is the linkage between developed East and central [16]. Chongqing (28°10´-32°13 N, 105°11´-110°11E) covers a land area of 82 402 km 2 with 38 districts and counties and an inhabitant population of about 31.01 million by the end of 2018. Now, Chongqing has become the largest and most populous city in China [17].
The incidence data are primarily gained from the Chongqing Center of Disease andControl (2004-2018), the data included not only the monthly incidence of AHC but also the age, sex and occupation for each case.

Statistical analysis
Firstly, in this study, using epidemiology to briefly describe the AHC epidemic in Chongqing, including the temporal and spatial distribution, high-incidence age group and so on. Then all the monthly incidence of AHC data were modelled and predicted, all the data analyses were performed using R 3.5.0.

Exponential smoothing model construction
For the deterministic analysis of the series, consider the seasonality of infectious diseases, we adopted the Holt-Winters exponential smoothing model. The model structure is as follows [18]: The additive Holt-Winters' seasonal exponential smoothing model is: The multiplicative Holt-Winters' seasonal exponential smoothing model is: In the equation, a t is the horizontal part of the sequence, b t is the trend part of the sequence, s t is the seasonal factor of the sequence (π is the cycle length of the sequence, and the cycle length studied in this paper is 12,π = 12).

SARIMA model construction
For the randomness analysis of the series, given the seasonal trend, we selected the SARIMA model. According to the literatures, the SARIMA model is developed from the ARIMA model. It uses autoregressive parameters, moving average parameters and the number of differencing passes to describe a series in which a pattern is repeated over time. In this study, SARIMA model was fitted using monthly incidence of AHC as the dependent variable and its past observations as the independent variable. The general model form of SARIMA fitted to the original observation sequence is [19] In the equation, B is the backward shift operator, ε t is the estimated residual at time t with zero mean and constant variance and x t denotes the observed value at time t (t = 1, 2 … k). The process is called SARIMA( p, d, q) × (P, D, Q) S (s is the length of the seasonal period). Where p, d and q are the autoregressive order, number of difference and moving average order, respectively; P, D and Q are the seasonal autoregressive order, number of seasonal difference and seasonal moving average order, respectively [19]. According to the autocorrelation function and partial autocorrelation function of the sequence, the values of the six parameters in the SARIMA model are determined and the fitted SARIMA model is obtained. The model was optimised by comparing Akaike information criterion (AIC), smaller AIC indicate the better fitting model [20].
The SARIMA model was performed using R software, in addition, with a two-sided significance level of P < 0.05.

Model comparison
The predicted accuracy of the Holt-Winters exponential smoothing model and the SARIMA model was compared by calculating the prediction error: the mean square error (MSE) and mean absolute percentage error (MAPE) [21]. Their mathematical formulas are [22]: Where x t is the actual incidence value, x t is the estimated incidence value, and n is the amount of months for forecasting. The MSE and MAPE were calculated to evaluate the accuracy of the forecast and to choose the best model. A lower MAPE value indicates a better fit of the data.

Descriptive analyses
This study reported 30 686 AHC cases in the past 15 years (2004-2018), in Chongqing, including 18 121 males and 12 565 females, and a male-to-female ratio of 1.44:1. AHC mostly occur within the ages of 10-19 years, what is more, the age group of 10-19 accounted for the 62.69% of all reported cases. The highest percentage of AHC cases was found in students, amount to 0.3975 (n = 12 198), followed by farmers and children. Figure 1 shows a sequence chart of the incidence of AHC from 2004 to 2018, which is mainly manifested as the epidemic peak from June to September, it shows obvious seasonal characteristics. Furthermore, there were several outliers in 2007, 2008, 2010 and 2014. Table 1 shows AHC ranked the top 10 regions in terms of prevalence in Chongqing from 2004 to 2018. The main popular areas of AHC were located in Kaizhou District, Beibei District and Rongchang District of Chongqing.  Figure 2 (the red line represents the original data, the solid blue line represents the fitting value, the dotted blue line represents the 95% confidence interval and the green line represents the actual incidence in 2018). Table 2 shows that the predicted values by three exponential smoothing methods (model one is additive Holt-Winters' seasonal exponential smoothing method, model two is multiplicative Holt-Winters' seasonal exponential smoothing method, model three is the additive Holt-Winters' seasonal exponential smoothing method after taking the natural logarithm of the original data). In addition, we used the 'decompose' function to decompose sequences. As shown in Figure 3, the data have obvious seasonality, tendency, and the residual sequence diagram is stationary, the multiplicative Holt-Winters' seasonal exponential smoothing model is significant.

SARIMA model
Firstly, a sequence diagram was drawn for AHC incidence data (Fig. 4a), the sequence diagram is non-stationary. After the natural logarithm transformation of the original data, we conducted a one-step difference and seasonal difference with a period of 12 to remove seasonal trends. The sequence diagram after the difference is basically stationary (Fig. 4b), and the ADF test results show that the sequence is stationary (P < 0.05). We estimated the six parameter values of the SARIMA ( p, d, q and P, D, Q) model based on the ACF and PACF graphs of the transformed time series. Observed the graphs of ACF and the graphs of PACF (Fig. 5) after data transformation, we developed four models and chosen 'CSS-ML' method to estimate the parameters.
SARIMA (1, 1, 1) (0, 1, 1) 12 Table 3 shows AIC values, MSE and MAPE for the SARIMA models to various choices of p and q. Although the MSE and MAPE values of the four model training sets are very close, the SARIMA(1, 1, 2) × (0, 1, 1) 12 model had the lowest AIC value. Goodness-of-fit analysis indicated that the SARIMA(1, 1, 2) × (0, 1, 1) 12 model fitted the data reasonably well. Moreover, in the residual white noise test of SARIMA(1, 1, 2) × (0, 1, 1) 12 model, the P-values of LB statistics were 0.9488 and 0.9981 (P > 0.05), respectively. The residual sequences are pure random, the stationary residual sequence indicates that the fitted SARIMA(1, 1, 2) × (0, 1, 1) 12 model is sufficient, the SARIMA(1, 1, 2) × (0, 1, 1) 12 model was significant. The equation of the SARIMA was: The predicted values were verified by applying the value from January to December 2018. The actual incidence and forecasted incidence from January to December 2018 in the Table 4 and the prediction diagram is shown in Figure 6 (the blue line represents the actual value, the solid red line represents the predicted value, the dashed red line represents the 95% confidence interval, and the solid grey line represents the actual value). It demonstrated that the results show that the predicted value can fluctuate Table 1.   Table 5, and the comparison showed that the prediction effect of the exponential smoothing Holt-Winters with multiplicative seasonality model was slightly better than that of the SARIMA(1, 1, 2) × (0, 1, 1) 12 model.

Discussion
AHC is one of the common eye diseases in Chongqing. The annual incidence was 1.569/100 000 at the lowest and 41.1682/ 100 000 at the highest from 2004 to 2018, in Chongqing, and affects nearly 1.5 times as many males as females in a total of 30 686 reported cases (1.44:1), which is consistent with some previous research results [23]. The high peaks about age and career of AHC cases were from 10 to 19 years old (62.69%) and students (39.75%), respectively, which indicate that the important protection objectives should be concentrated in students and teenagers. Moreover, although AHC occurs all year, the peak of the incidence of AHC is from June to September each year, showing the obvious periodicity and seasonality [24]. In this study, there are several special values that occurred in 2007, 2008, 2010 and 2014, respectively, which may be related to frequent flood disasters in Chongqing, especially the severe and rare floods that occurred in Chongqing in 2007 [25]. According to literature reports, AHC is most likely to break out and spread after flood. The destruction and pollution of water source and related living infrastructure, the obvious decline of sanitary conditions, and the unstable psychological factors of the population after flood, which may lead to the decline of immunity [26,27]. Furthermore, Kaizhou District, Beibei District and Rongchang District are high-attack areas in Chongqing. This may be relevant

Epidemiology and Infection
to different geographical features and demographic structure, and lack of education promotion, so we should control AHC reasonably according to geographical differences. So before the highincidence season comes, strengthen the health management of key public places and improve the health system. During the epidemic season, hospitals should start specialised temporary outpatient clinics, strengthen the pre-screening and triage of patients, pay attention to disinfection of ophthalmic devices, prevent cross-infection. At the same time, strengthen education and publicity, wash hands frequently, and do a good job in personal and family hygiene and so on.
In this paper, we performed time series analysis on the data from two aspects. Deterministic analysis, we adopted exponential smoothing method to analyse. It has been generally applied in economy, industry, agriculture, transportation, medicine [28]. The other is stochastic analysis, we adopted SARIMA model. ARIMA model has been more and more applied to illustrate the time patterns of malaria [29], tuberculosis [30,31], dengue [32] and other diseases [33,34].
According to the sequence diagram of AHC incidence, there is obvious seasonality and tendency in this sequence. Therefore, we use the Holt-Winters exponential smoothing method for analysis.   Hongfang Qiu et al.
The exponential smoothing Holt-Winters with additive seasonality model and exponential smoothing Holt-Winters with multiplicative seasonality model were established. At the same time, considering that there are some high values in the original sequence, the original data were transformed into the natural logarithm, while the sequence after the natural logarithm transformation can only be transformed into the additive Holt-Winters' seasonal exponential smoothing model. Then the predictive value and prediction graph of the incidence of AHC by three exponential smoothing models were calculated, and the comparison showed that the multiplicative Holt-Winters' seasonal exponential smoothing method had the best predictive effect.
For the random analysis of non-stationary sequences, considering the existence of seasonal factors and trend factors, this paper adopted the SARIMA model for modelling and analysed the four SARIMA models based on the incidence of AHC. By comparing the AIC values of the four models, it was concluded that the SARIMA(1, 1, 2) × (0, 1, 1) 12 model was the best.
Finally, in order to compare the prediction effect of the Holt-Winters multiplicative exponential smoothing model and the SARIMA(1, 1, 2) × (0, 1, 1) 12 model, the MSE and MAPE values of the two models were calculated respectively in this paper. The results showed that the predictive effect of the multiplicative Holt-Winters' seasonal exponential smoothing model was slightly better than that of the SARIMA(1, 1, 2) × (0, 1, 1) 12 model. Therefore, it follows that the exponential smoothing Holt-Winters with multiplicative seasonality model was better for short-term prediction of the incidence of AHC. Moreover, the incidence of AHC showed significant seasonality, the exponential smoothing Holt-Winters with multiplicative seasonality model might be suitable for the forecast of time-series data with trends and seasonality in this paper.
There are some limitations of this study, in this study, only the incidence of AHC was used for preliminary modelling prediction, and many factors affecting the incidence of AHC were not considered in this paper, in addition, we only made short-term predictions. Therefore, in order to establish a long-term stable prediction model, various factors affecting the incidence of AHC need to be considered. In the following research work, we can use hybrid models to analyse or predict disease, such as SARIMA-NAR hybrid model [22], SARIMA-NARNNX hybrid model [35], SARIMA-NARX hybrid model [36] and so on.

Conclusions
The short-term forecast of AHC can evaluate the prevention or control measures. Meanwhile, we can adopt timely and effective countermeasures for the epidemic peak that may occur. In this  Epidemiology and Infection paper, the exponential smoothing based on the multiplicative model could be availably used in the time series analysis of AHC in Chongqing, and all the actual values fell in the confidence interval of the predicted value. It suggests that exponential smoothing can be used to predict the incidence of AHC. So the predictions of the incidence of AHC could generate useful information for designing the strategies of public health services.