Introduction
Acute haemorrhagic conjunctivitis (AHC) is a highly infective eye disease, which is mainly caused by either enterovirus 70 (EV70) or coxsackievirus A24 (CVA24) infection [Reference Jun1, Reference Zhang2]. The main symptoms are conjunctival congestion, burning feeling, photophobia, tears, feeling foreign body and so on. Outbreaks have been found in many countries, such as India, Senegal and so on [Reference Nidaira3–Reference Tavares5]. Although cases occur every month, AHC incidence has obvious seasonal characteristics [Reference Yan6]. The symptoms appear within 2 days after direct contact with the source of infection [Reference Bi7], and the incidence of AHC is related to temperature and humidity [Reference Tang8].
In China, AHC is defined as a notifiable infectious disease [Reference Ling9]. It is a common eye infection disease in our country and has been reported in many cities, the incidence of AHC in Chongqing ranks the top 5 in China [Reference Wang10]. The epidemic circumstance still serious because AHC is highly contagious and there is no specific effective treatment. Modelling and forecasting of the incidence of AHC provide a basis for developing policies and interventions.
Time series analysis is a scientific quantitative prediction of the future trend of diseases based on historical data and time variables. It is a quantitative analysis method without considering the influence of complex factors [Reference Wu11] and widely used in various fields. The exponential smoothing method has the characteristics of simple calculation, no requirements for data distribution, and easy operation. It is a simple and feasible short-term time series prediction method [Reference Mahajan S12]. ARIMA model is one of the most representative and widely applied models in the prediction of time series [Reference Li13], this method is very simple and requires only endogenous variables instead of other exogenous variables.
Although there are some AHC studies based on epidemic disease [Reference Zhang2, Reference Li14, Reference Leveque15], it is the first time to apply exponential smoothing method and SARIMA model for the incidence of AHC in Chongqing. This study used monthly incidence of AHC from 2004 to 2017 as a training set for fitting SARIMA model and Holt-Winters exponential smoothing model. Monthly data of 2018 as the verification set, comparing two models of the incidence of AHC to predict performance, to determine the optimal model to predict the tendency of the incidence of AHC and provide scientific basis for AHC prevention and control in Chongqing.
Materials and methods
Study area and data collection
Chongqing is the fourth city firsthand under the jurisdiction of the Chinese central government, it is the linkage between developed East and central [Reference Li16]. Chongqing (28°10′–32°13′N, 105°11′–110°11E) covers a land area of 82 402 km2 with 38 districts and counties and an inhabitant population of about 31.01 million by the end of 2018. Now, Chongqing has become the largest and most populous city in China [Reference Hu17].
The incidence data are primarily gained from the Chongqing Center of Disease and Control (2004–2018), the data included not only the monthly incidence of AHC but also the age, sex and occupation for each case.
Statistical analysis
Firstly, in this study, using epidemiology to briefly describe the AHC epidemic in Chongqing, including the temporal and spatial distribution, high-incidence age group and so on. Then all the monthly incidence of AHC data were modelled and predicted, all the data analyses were performed using R 3.5.0.
Exponential smoothing model construction
For the deterministic analysis of the series, consider the seasonality of infectious diseases, we adopted the Holt-Winters exponential smoothing model. The model structure is as follows [Reference Wang18]:
The additive Holt-Winters' seasonal exponential smoothing model is:
The multiplicative Holt-Winters' seasonal exponential smoothing model is:
In the equation, a t is the horizontal part of the sequence, b t is the trend part of the sequence, s t is the seasonal factor of the sequence (π is the cycle length of the sequence, and the cycle length studied in this paper is 12,π = 12).
SARIMA model construction
For the randomness analysis of the series, given the seasonal trend, we selected the SARIMA model. According to the literatures, the SARIMA model is developed from the ARIMA model. It uses autoregressive parameters, moving average parameters and the number of differencing passes to describe a series in which a pattern is repeated over time. In this study, SARIMA model was fitted using monthly incidence of AHC as the dependent variable and its past observations as the independent variable. The general model form of SARIMA fitted to the original observation sequence is [Reference Cao19]:
In the equation, B is the backward shift operator, ɛt is the estimated residual at time t with zero mean and constant variance and x t denotes the observed value at time t (t = 1, 2 … k). The process is called SARIMA(p, d, q) × (P, D, Q)S (s is the length of the seasonal period). Where p, d and q are the autoregressive order, number of difference and moving average order, respectively; P, D and Q are the seasonal autoregressive order, number of seasonal difference and seasonal moving average order, respectively [Reference Cao19]. According to the autocorrelation function and partial autocorrelation function of the sequence, the values of the six parameters in the SARIMA model are determined and the fitted SARIMA model is obtained. The model was optimised by comparing Akaike information criterion (AIC), smaller AIC indicate the better fitting model [Reference Peng20].
The SARIMA model was performed using R software, in addition, with a two-sided significance level of P < 0.05.
Model comparison
The predicted accuracy of the Holt-Winters exponential smoothing model and the SARIMA model was compared by calculating the prediction error: the mean square error (MSE) and mean absolute percentage error (MAPE) [Reference Wang21]. Their mathematical formulas are [Reference Wang22]:
Where x t is the actual incidence value, $\mathop {x_t}\limits^\wedge$ is the estimated incidence value, and n is the amount of months for forecasting. The MSE and MAPE were calculated to evaluate the accuracy of the forecast and to choose the best model. A lower MAPE value indicates a better fit of the data.
Results
Descriptive analyses
This study reported 30 686 AHC cases in the past 15 years (2004–2018), in Chongqing, including 18 121 males and 12 565 females, and a male-to-female ratio of 1.44:1. AHC mostly occur within the ages of 10–19 years, what is more, the age group of 10–19 accounted for the 62.69% of all reported cases. The highest percentage of AHC cases was found in students, amount to 0.3975 (n = 12 198), followed by farmers and children.
Figure 1 shows a sequence chart of the incidence of AHC from 2004 to 2018, which is mainly manifested as the epidemic peak from June to September, it shows obvious seasonal characteristics. Furthermore, there were several outliers in 2007, 2008, 2010 and 2014.
Table 1 shows AHC ranked the top 10 regions in terms of prevalence in Chongqing from 2004 to 2018. The main popular areas of AHC were located in Kaizhou District, Beibei District and Rongchang District of Chongqing.
Exponential smoothing model
Given the season and trend effect, the data were fitted by the additive Holt-Winters' seasonal exponential smoothing model, the multiplicative Holt-Winters' seasonal exponential smoothing model, and considering several outliers, we also fitted the additive Holt-Winters' seasonal exponential smoothing model based on the original data after taking the natural logarithm. We calculate the MSE of multiplicative Holt-Winters' seasonal exponential smoothing model is 0.0152, MAPE is 0.1871 (Table 2), while MSE of additive Holt-Winters' seasonal exponential smoothing model is 0.1396, MAPE is 0.7607, and MSE of the processed additive Holt-Winters' seasonal exponential smoothing model is 0.1328, MAPE is 0.7214. A lower MSE and MAPE value indicates a better fit of the model, so the multiplicative Holt-Winters' seasonal exponential smoothing model is the optimum model in the three prediction models, and the three fitting charts were drawn in Figure 2 (the red line represents the original data, the solid blue line represents the fitting value, the dotted blue line represents the 95% confidence interval and the green line represents the actual incidence in 2018).
Table 2 shows that the predicted values by three exponential smoothing methods (model one is additive Holt-Winters' seasonal exponential smoothing method, model two is multiplicative Holt-Winters' seasonal exponential smoothing method, model three is the additive Holt-Winters' seasonal exponential smoothing method after taking the natural logarithm of the original data). In addition, we used the ‘decompose’ function to decompose sequences. As shown in Figure 3, the data have obvious seasonality, tendency, and the residual sequence diagram is stationary, the multiplicative Holt-Winters' seasonal exponential smoothing model is significant.
SARIMA model
Firstly, a sequence diagram was drawn for AHC incidence data (Fig. 4a), the sequence diagram is non-stationary. After the natural logarithm transformation of the original data, we conducted a one-step difference and seasonal difference with a period of 12 to remove seasonal trends. The sequence diagram after the difference is basically stationary (Fig. 4b), and the ADF test results show that the sequence is stationary (P < 0.05).
We estimated the six parameter values of the SARIMA (p, d, q and P, D, Q) model based on the ACF and PACF graphs of the transformed time series. Observed the graphs of ACF and the graphs of PACF (Fig. 5) after data transformation, we developed four models and chosen ‘CSS-ML’ method to estimate the parameters.
SARIMA (1, 1, 1) (0, 1, 1) 12
Table 3 shows AIC values, MSE and MAPE for the SARIMA models to various choices of p and q. Although the MSE and MAPE values of the four model training sets are very close, the SARIMA(1, 1, 2) × (0, 1, 1)12 model had the lowest AIC value. Goodness-of-fit analysis indicated that the SARIMA(1, 1, 2) × (0, 1, 1)12 model fitted the data reasonably well. Moreover, in the residual white noise test of SARIMA(1, 1, 2) × (0, 1, 1)12 model, the P-values of LB statistics were 0.9488 and 0.9981 (P > 0.05), respectively. The residual sequences are pure random, the stationary residual sequence indicates that the fitted SARIMA(1, 1, 2) × (0, 1, 1)12model is sufficient, the SARIMA(1, 1, 2) × (0, 1, 1)12 model was significant. The equation of the SARIMA was:
The predicted values were verified by applying the value from January to December 2018. The actual incidence and forecasted incidence from January to December 2018 in the Table 4 and the prediction diagram is shown in Figure 6 (the blue line represents the actual value, the solid red line represents the predicted value, the dashed red line represents the 95% confidence interval, and the solid grey line represents the actual value). It demonstrated that the results show that the predicted value can fluctuate up and down with the original series. The MSE of prediction was 0.03814 and the MAPE of prediction was 0.3164, respectively. Furthermore, some outliers appeared in the predicted value, for example, the predicted incidence of AHC in September 2018 was 0.9946/100 000, while the actual incidence was 0.3935/100 000. Showing a significant difference, which may be caused by particularly high values in 2007, 2008, 2010 and 2014 year, such as the incidence of AHC was 35.3725/100 000 in September 2010.
Using SARIMA(1, 1, 2) × (0, 1, 1)12model and exponential smoothing Holt-Winters with multiplicative seasonality model to predict the incidence of AHC in 2018, the MSE and MAPE values of the two models were calculated in Table 5, and the comparison showed that the prediction effect of the exponential smoothing Holt-Winters with multiplicative seasonality model was slightly better than that of the SARIMA(1, 1, 2) × (0, 1, 1)12model.
Discussion
AHC is one of the common eye diseases in Chongqing. The annual incidence was 1.569/100 000 at the lowest and 41.1682/100 000 at the highest from 2004 to 2018, in Chongqing, and affects nearly 1.5 times as many males as females in a total of 30 686 reported cases (1.44:1), which is consistent with some previous research results [Reference De23]. The high peaks about age and career of AHC cases were from 10 to 19 years old (62.69%) and students (39.75%), respectively, which indicate that the important protection objectives should be concentrated in students and teenagers. Moreover, although AHC occurs all year, the peak of the incidence of AHC is from June to September each year, showing the obvious periodicity and seasonality [Reference Li24]. In this study, there are several special values that occurred in 2007, 2008, 2010 and 2014, respectively, which may be related to frequent flood disasters in Chongqing, especially the severe and rare floods that occurred in Chongqing in 2007 [Reference Xiao25]. According to literature reports, AHC is most likely to break out and spread after flood. The destruction and pollution of water source and related living infrastructure, the obvious decline of sanitary conditions, and the unstable psychological factors of the population after flood, which may lead to the decline of immunity [Reference Yin-Murphy26, Reference Miyamura27]. Furthermore, Kaizhou District, Beibei District and Rongchang District are high-attack areas in Chongqing. This may be relevant to different geographical features and demographic structure, and lack of education promotion, so we should control AHC reasonably according to geographical differences. So before the high-incidence season comes, strengthen the health management of key public places and improve the health system. During the epidemic season, hospitals should start specialised temporary outpatient clinics, strengthen the pre-screening and triage of patients, pay attention to disinfection of ophthalmic devices, prevent cross-infection. At the same time, strengthen education and publicity, wash hands frequently, and do a good job in personal and family hygiene and so on.
In this paper, we performed time series analysis on the data from two aspects. Deterministic analysis, we adopted exponential smoothing method to analyse. It has been generally applied in economy, industry, agriculture, transportation, medicine [Reference Guan28]. The other is stochastic analysis, we adopted SARIMA model. ARIMA model has been more and more applied to illustrate the time patterns of malaria [Reference Kumar29], tuberculosis [Reference Moosazadeh30, Reference Wang31], dengue [Reference Martinez32] and other diseases [Reference Lin33, Reference Picardeau34].
According to the sequence diagram of AHC incidence, there is obvious seasonality and tendency in this sequence. Therefore, we use the Holt-Winters exponential smoothing method for analysis. The exponential smoothing Holt-Winters with additive seasonality model and exponential smoothing Holt-Winters with multiplicative seasonality model were established. At the same time, considering that there are some high values in the original sequence, the original data were transformed into the natural logarithm, while the sequence after the natural logarithm transformation can only be transformed into the additive Holt-Winters' seasonal exponential smoothing model. Then the predictive value and prediction graph of the incidence of AHC by three exponential smoothing models were calculated, and the comparison showed that the multiplicative Holt-Winters' seasonal exponential smoothing method had the best predictive effect.
For the random analysis of non-stationary sequences, considering the existence of seasonal factors and trend factors, this paper adopted the SARIMA model for modelling and analysed the four SARIMA models based on the incidence of AHC. By comparing the AIC values of the four models, it was concluded that the SARIMA(1, 1, 2) × (0, 1, 1)12 model was the best.
Finally, in order to compare the prediction effect of the Holt-Winters multiplicative exponential smoothing model and the SARIMA(1, 1, 2) × (0, 1, 1)12 model, the MSE and MAPE values of the two models were calculated respectively in this paper. The results showed that the predictive effect of the multiplicative Holt-Winters' seasonal exponential smoothing model was slightly better than that of the SARIMA(1, 1, 2) × (0, 1, 1)12 model. Therefore, it follows that the exponential smoothing Holt-Winters with multiplicative seasonality model was better for short-term prediction of the incidence of AHC. Moreover, the incidence of AHC showed significant seasonality, the exponential smoothing Holt-Winters with multiplicative seasonality model might be suitable for the forecast of time-series data with trends and seasonality in this paper.
There are some limitations of this study, in this study, only the incidence of AHC was used for preliminary modelling prediction, and many factors affecting the incidence of AHC were not considered in this paper, in addition, we only made short-term predictions. Therefore, in order to establish a long-term stable prediction model, various factors affecting the incidence of AHC need to be considered. In the following research work, we can use hybrid models to analyse or predict disease, such as SARIMA-NAR hybrid model [Reference Wang22], SARIMA-NARNNX hybrid model [Reference Wang35], SARIMA-NARX hybrid model [Reference Wang36] and so on.
Conclusions
The short-term forecast of AHC can evaluate the prevention or control measures. Meanwhile, we can adopt timely and effective countermeasures for the epidemic peak that may occur. In this paper, the exponential smoothing based on the multiplicative model could be availably used in the time series analysis of AHC in Chongqing, and all the actual values fell in the confidence interval of the predicted value. It suggests that exponential smoothing can be used to predict the incidence of AHC. So the predictions of the incidence of AHC could generate useful information for designing the strategies of public health services.
Acknowledgements
The authors express their thanks to the Chongqing Municipal Center for Disease Control and Prevention for the disease data as well as the help from teachers of Chongqing Medical University.
Author contributions
The authors Hongfang Qiu and Dewei Zeng contributed equally to this work and should be considered co-first authours, and Mengliang Ye is the corresponding author.
Financial support
This research received no external funding.
Conflict of interest
The authors declare no conflict of interest.
Data availability statement
The incidence of AHC data is gained from the Chongqing Center of Disease and Control, it is confidential data and cannot be uploaded to your organisation. The incidence is equal to the number of new cases of a disease in a population during a period divided by the number of people exposed during the same period.