Imported cases and minimum temperature drive dengue transmission in Guangzhou, China: evidence from ARIMAX model

Abstract Dengue is the fastest spreading mosquito-transmitted disease in the world. In China, Guangzhou City is believed to be the most important epicenter of dengue outbreaks although the transmission patterns are still poorly understood. We developed an autoregressive integrated moving average model incorporating external regressors to examine the association between the monthly number of locally acquired dengue infections and imported cases, mosquito densities, temperature and precipitation in Guangzhou. In multivariate analysis, imported cases and minimum temperature (both at lag 0) were both associated with the number of locally acquired infections (P < 0.05). This multivariate model performed best, featuring the lowest fitting root mean squared error (RMSE) (0.7520), AIC (393.7854) and test RMSE (0.6445), as well as the best effect in model validation for testing outbreak with a sensitivity of 1.0000, a specificity of 0.7368 and a consistency rate of 0.7917. Our findings suggest that imported cases and minimum temperature are two key determinants of dengue local transmission in Guangzhou. The modelling method can be used to predict dengue transmission in non-endemic countries and to inform dengue prevention and control strategies.


Introduction
Dengue is the fastest spreading mosquito-borne viral disease worldwide and is primarily transmitted by Aedes albopictus and Aedes aegypti. It affects 3.6 billion people, with 96 million out of approximately 390 million infections annually resulting in symptomatic infection [1]. Dengue circulates in more than 100 countries each year and has the great impact in the Americas, Asia, Africa and Australia [2], imposing a great economic burden on affected areas with accounting for approximately 1 billion US dollars in annual economic cost for Southeast Asian countries [3].
The causative agent of dengue is dengue virus (DENV), which is antigenically divided into four serotypes (DENV-1, DENV-2, DENV-3 and DENV-4) [4]. The spectrum of dengue infection ranges from inapparent infection, non-specific febrile illness and self-limited dengue fever (DF) to severe dengue, which manifested by dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). A second infection caused by a different serotype can increase the risk for severe disease mainly due to antibody-dependent enhancement (ADE) [5]. Rapid fluid loss into tissue spaces can lead to death from hemoconcentration and hypotension [2]. At present, the licensed vaccine must be combined with effective vector control to prevent large-scale outbreaks due to its modest efficacy [6].
Dengue was first identified in mainland China in 1978 from Foshan City after 1949, that is adjacent to Guangzhou [7] and has been a notifiable disease since 1 September 1989 [8]. Dengue transmission mostly occurred in Guangzhou City with more than 50% of the cases in mainland China reported since 1989 [9]. All four serotypes had circulated in Guangzhou since 1978 with one serotype dominating each transmission year [9]. Reported indigenous cases meaning locally acquired dengue infections in Guangzhou soared dramatically beyond expectations in 2014, with 37 340 indigenous cases that accounted for approximately 80% of the total reported cases in mainland China [10]. However, the critical risk factors of dengue transmission in Guangzhou are still poorly understood.
Statistical models indicate that dengue incidence on a local scale is associated with imported cases also named travel-related dengue infections, vector densities, climate variables, socioeconomic and environmental factors, population immune status and human movement [11][12][13][14][15]. However, the exact relationship between the risk factors and dengue incidence varies by location; for instance, a variable can increase the dengue risk in some districts but decrease it in other areas [16]. Early importation and larger numbers of imported cases in the calendar year have been identified as the most important determinants underlying dengue transmission in China [10], Japan [17] and Australia [18]. Most studies suggest that precipitation is positively correlated with dengue risk in tropical countries; however, a few epidemics have also been documented in areas with low rainfall or larval indices, such as Thailand in 1987 [19] and Singapore in 1986 [20]. Warmer temperature is associated with dengue incidence in Southeast Asia [21]. Most dengue transmission occurs when the temperature is higher than 20°C. However, extremely high temperatures (>36°C) can also limit the feeding activity of female mosquitoes to restrain transmission [16]. A series of statistical methods including negative binomial regression model, generalised additive model, times series were used in such studies. And due to the autocorrelation of dengue occurrence, times series model showed easy operation and powerful prediction ability in the actual application.
In this study, we use prewhitened cross-correlation and autoregressive integrated moving average (ARIMA) model to analyse the dengue occurrence in Guangzhou from 2001 to 2016. We seek to improve the understanding of critical risk factors for dengue transmission in Guangzhou, with the aim of developing a predictive dengue model to inform dengue prevention plans.

Ethics statement
The present study was fully reviewed and approved by the Ethics Committee of the Guangzhou Center for Disease Control and Prevention (GZCDC) and the Ethics Committee of the School of Public Health of Sun Yat-sen University. All of the patient data were de-identified and the data were analysed anonymously.

Research area
Guangzhou is the capital city of Guangdong Province in southern China. With a current population of more than 12.84 million, it is one of the most highly populated cities in the world. Located at 112°57 E to 114°3 E and 22°26 N to 23°56 N and cover a total area of 7434.40 km 2 , Guangzhou has 10 administrative districts (Liwan, Yuexiu, Haizhu, Baiyun, Tianhe, Huangpu, Luogang, Panyu, Nansha and Huadu) and two satellite cities (Conghua and Zengcheng) (Fig. 1). Guangzhou is one of the most urbanised areas in China and is the most important center of international commerce in south China. It has a humid subtropical climate that is influenced by Asian monsoons [22], resulting in a wet and warm climate that is favourable for vector reproduction [10]. Over the past three decades, Aedes albopictus was observed to be the sole vector for dengue transmission in Guangzhou with the absence of Aedes aegypti. According to vector surveillance data, Aedes albopictus ranked second at a proportion of approximately 5.89% of all adult mosquitos, following only Culex fatigans (89.90%) in Guangdong province [23].

Data sources
Monthly dengue cases aggregated by onset date from 2001 to 2016 were collected by the GZCDC in accordance with the Law on Prevention and Treatment of Infectious Diseases of China. Once diagnosed in medical institutions, the case must be reported to the web-based National Notifiable Infectious Disease Reporting Information System within 24 h according to the National Diagnostic Criteria for Dengue Fever (WS216-2008) published by the Chinese Ministry of Health [9]. Through the face-to-face interviews conducted by GZCDC, the cases are classified as either imported or indigenous based on the patient's travel history and dengue's human incubation period.
Monthly vector surveillance which was conducted in each midmonth days in total 36 representative communities, with the Breteau index (BI) for indoor larval density, standard space index (SSI) for outdoor larval density and adult mosquito density index (ADI) monitored by the GZCDC [24]. The definitions of the indices are as follows: BI = the number of containers positive for larvae or pupae per 100 houses, SSI = the number of containers positive for larvae or pupae per 100 standard spaces in which one stand space of 15 m 2 outdoors and the ADI = the number of Aedes per hour by the human snare method. Monthly temperature and precipitation data for Guangzhou from 2001 to 2016 were downloaded from the China Meteorological Data Sharing Service System (http://cdc.nmic.cn/). The mean (T mean ), maximum (T max ) and minimum temperatures (T min ) and the total (P total ), mean (P mean ) and maximum precipitation (P max ) were used in this study.

Data processing and time series analysis
The time series data collected above were divided into two parts as follows: the first 168 months (2001-2014) were used to construct the time series model with the Box and Jenkins approach [25,26] and the last 24 months (2015-2016) was used to validate the predictive power of the model.

ARIMA model for dengue occurrence
Because of temporal autocorrelations between data points and dengue's seasonal transmission patterns, time series analyses, especially ARIMA model, are effective to model dengue transmission. The model provides a general framework for analysing seasonal disease incidence data such as dengue [15], influenza [27], leptospirosis [28], tuberculosis [29], malaria [30] and brucellosis [31]. Accounting for temporal autocorrelations between variables can lead to higher accuracy using ARIMA on the same data [28]. Furthermore, it can incorporate external factors, such as climatic variables, which can improve the fitting and prediction accuracy; under these circumstances, the model as an extension of the ARIMA model is named the ARIMAX model.
The model expression ARIMA ( p, d, q) (P, D, Q) s [32], p is the order of the autoregression (AR) process, q is the number of moving average (MA) terms, d represents the differencing process to form a stationary times series, and P, D and Q are the seasonal orders of the AR, differencing and MA processes, respectively. Additionally, s denotes the seasonal period. Differencing was used to stabilise the variance and mean to fit the model's requirements [33]. In the ARIMA model, if the time series was nonstationary, the log transformation and differencing were used to remove the nonstationary terms. In our study, we generated a log transformation plus 2 (log (the number of indigenous cases +2)) for the monthly indigenous dengue cases. We added 2 to case incidence data to avoid zero prior to taking the log transformation and modeling processes. The predicted number of indigenous dengue cases at time t (Y t ) was determined by the model Epidemiology and Infection 1227 equation (1) [26,33]: here, θ(B) is the operator for MA, Θ (B s ) is the seasonal MA operator, φ(B) is the AR operator, Φ(B s ) is the seasonal AR operator, (1 − B) d and (1 − B s ) D denote the ordinary and seasonal difference components, respectively, a t and Y t represent the white noise and the dependent variable, respectively [33]. The p and q orders can be inferred using the cutoff time lag of the autocorrelation function (ACF) and the partial autocorrelation function (PACF). If the time series showed a seasonal pattern with an ACF peak at a time lag of s, then the same procedure was applied to the seasonal ARIMA model. The Ljung-Box test was used to test the model residuals did not show residual autocorrelation.

Cross-correlation analysis
Due to strong autocorrelations in the data, correlations of the time series of the monthly number of indigenous cases with imported cases, vector densities and climate were difficult to identify. In this study, a prewhitening process was applied to the data among the multiple external regressors. Prewhitening to avoid common trends between incidence and risk factors [26], was conducted in three steps. First, we determined a time series model for the y-variable in step 1 and stored the residuals from this model and then filtered the x-variable series using the y-variable model in step 2 and finally examined the cross-correlation function (CCF) between the residuals from Step 1 and the filtered x-values from Step 2. The CCF of the prewhitened imported cases, vector density and climate variables with the indigenous cases time series were calculated to identify the significant time lags. The factors that did not show a significant time lag not included in the model analysis [34]. Based on the results from previous studies as well as biological and epidemiological plausibility, we selected those positively significant lag values for the next step of the analysis with maximum lag steps of four.

Univariate and multivariate ARIMAX analyses
The lag value of the statistically significant factors identified by the cross-correlation analysis was incorporated as external covariates into the ARIMAX model constructed above. The ARIMAX model is described by equation (2) [26,33]: here, X is the external regressor, which may be univariate or multivariate. The other parameters are as described in the ARIMA model above. In this study, we first took a single lag value into the univariate ARIMAX model. For the multivariate analysis, the incorporated factors were selected from the factors with significant regression coefficients in the univariate analysis (P < 0.10). P < 0.05 in the regression coefficient test was considered statistically significant in the multivariate model. The coefficients of the model were estimated using the maximum likelihood method. The AIC was used to identify the best fitting model while taking into account possible overfitting of the data. In this study, RMSE was also used to determine the best performance. For the model diagnostics, the ARIMA and ARIMAX models were used to test whether model residuals were significantly different from white noise. Finally, we used cross-validation of one-step method using the selected model to measure model performance.

Model validation for testing outbreak
According to the Technical Guidelines for Dengue Control and Prevention enacted by Chinese Center for Diseases Control and Prevention, a dengue outbreak is as the appearance of three confirmed indigenous cases in a period no longer than 15 days in a confined geographic area, such as one community, village, school or other collective units.
To investigate model robustness, the cross-validation for the testing outbreak was applied. However when applied to time series data traditional leave-one-out or k-fold cross validation may produce overly optimistic results due to temporal autocorrelation in the data, violating usual assumptions of independence. In our time series analysis, cross-validated error taking into account temporal dependence in the data was obtained as follows [35]: (1) Fit the model to the observed data Y 1 , Y 2 ,…,Y t and let Y t+1 denotes the predicted value of the next observation. Then compute the error (e t+1 = Y t+1 − Y t+1 ) for the forecast observation.
(2) Repeat step 1 for t = k,…, n − 1 where k is the minimum number of observations needed for fitting the model and n is the total number of observations. (3) Compute the mean RMSE from e k+1 ,…, e n . In our paper, we took k as 168 for our initial training data from January 2001 to December 2014 and the following 24 observations as testing data with initial forecast observation on January 2015 and the final forecast observation on December 2016.
The receiver operating characteristic curve (ROC) was used to determine the optimal cut-off point for the occurrence of an outbreak. Model accuracy was evaluated by the area under the ROC curve (AUC), larger values indicating higher accuracy. Afterwards the sensitivity, specificity and consistency rate for predicting outbreaks were computed, based on evaluating model error on testing data from January 2015 to December 2016 [36].
The R software (version 3.2.2, the R foundation for Statistical Computing, Vienna, Austria) with the TSA, forecast and pROC packages were used for the time series analysis and graphical display. To stabilise the variance and subsequent differencing of indigenous dengue case incidence data, we calculated the natural log transformation plus 2 (log (number of indigenous cases +2)) for the monthly indigenous cases in the study, which is depicted in Figure 2.

Dengue occurrence patterns and their associated factors
The temperature (T mean , T max and T min ), precipitation (P total , P max and P mean ) and vector density (BI, SSI and ADI) variables showed similar patterns, with high values in the summer and autumn. The average monthly number of imported cases over the study period is plotted in Figure 3 and Supplementary  Table S1. A monthly number of imported cases generally proceeded indigenous cases, which arose through the local transmission. The highest total precipitation (P total = 3006.5 mm) occurred in 2016, although there were 204 reported indigenous cases in that year; meanwhile, the lowest total precipitation (P total = 1338.7 mm) occurred in 2003, when there were 76 reported indigenous cases.

ARIMA model
The best fitting model selected according to AIC was the ARIMA (0,1,1)(0,0,2) 12 model with an AIC of 400.8343 and RMSE of 0.7762. The ACF and PACF plots were used to estimate the temporal autocorrelation, as demonstrated in Figure 4. The Ljung-Box test indicated that model residuals did not significantly depart from a white noise process (P > 0.05).

Cross-correlation analysis
After prewhitening, we calculated the cross-correlation coefficients between the mosquito densities, climate variables and imported cases. Table 1 shows that the log(indigenous cases + 2) time series correlated positively with the imported cases at the 0 and 1-month lags, the mean temperature at lag 0, the maximum temperature at lag 0 and the minimum temperature at lag 0.

Univariate and multivariate ARIMAX analyses
Based on the previous cross-correlation analysis, we tested the ARIMAX model by incorporating the imported cases at lag 0 and lag 1, mean temperature at lag 0, maximum temperature at lag 0 and minimum temperature at lag 0 as external regressors based on the ARIMA (0,1,1)(0,0,2) 12 model from a log (indigenous cases +2) time series.

Discussion
In this study, we developed and evaluated time series models to characterise the risk factors of local dengue transmission in Guangzhou to inform control measures. We found that the number of imported cases and the minimum temperature were the important variables associated with local dengue transmission in Guangzhou City. And the multivariate ARIMAX model that included both the imported cases and the minimum temperature at lag 0 had a better fit than the inclusion of one variable alone.
This modeling technique can be a useful tool for planning local control interventions and could be implemented during routine dengue surveillance in Guangzhou. A higher number of overseas travellers boost the risk of global dengue transmission [37]. Previous studies of phylogenetic analysis and travel history surveys suggest that imported cases could affect dengue transmission [9,38]. However, few studies have quantitatively modeled imported cases with dengue occurrence. In the unprecedented 2014 dengue outbreak in Guangzhou, the number and timing of the imported dengue cases strongly determined the local occurrence and outbreak size [10]. Virus dissemination due to population movement contributes to an increase in the outbreak potential in non-endemic areas [33]. Increased tourism and commerce can introduce the dengue virus into  Temperature also influences the dengue transmission dynamics, mainly through indirect effects. Increasing temperatures can increase the biting frequency of Aedes and the survival rates during emergence [39] and reduce the extrinsic incubation period of dengue in the mosquito [40]. However, there is little evidence that the highest temperature drives patterns of dengue epidemics in  ICs, imported cases; BI, Breteau index; SSI, standard space index; ADI, adult density index; T mean , mean temperature; T max , maximum temperature; T min , minimum Temperature; P total , total precipitation; P mean , mean precipitation; P max , maximum precipitation. '+', positive statistical significance; '.' no statistical significance; '−', negative statistical significance.

Epidemiology and Infection 1231
Southeast Asia [21]. Interestingly, the minimum temperature played a significant role in dengue transmission in this study. Several studies have highlighted the association between the minimum temperature and dengue transmission. In Taiwan, the minimum temperature correlated with dengue cases at a lag of 1-3 months [41]. In Guadeloupe, minimum temperature with a 5 week time lag was the best climatic variable for the prediction of dengue outbreaks [33]. Thus, the evidence suggests that more severe dengue transmission occurs at higher monthly minimum temperature.
Prewhitening was conducted prior to calculating the crosscorrelation between the dengue cases and risk factors. This process can measure the association between temporally autocorrelated data, whereas the Pearson or Spearman correlation requires independent and identically distributed observations [15]. Finally, we found that dengue cases significantly correlated with both the number of imported cases and the minimum temperature at lag 0. However, no positive correlation was found with the BI, SSI, ADI and precipitation at any lag. The mean, maximum and minimum temperatures all showed a strong correlation with the number of dengue cases at lag 0. The models incorporating these variables as external regressors indicated improved model performance [41]. At present, it is generally known that the increasing risk of dengue transmission is dependent on vector density, however, its correlation with dengue is generally poor, as shown in our study [15]. Excepting for vector density, this phenomenon may in part be due to the fact, vector capacity is also related with biting rate, the probability of vector survival, extrinsic incubation period and vector competence [42]. This result may also be associated with the method itself due to the subjectivity of the vector density survey and potential bias in data collection. Especially for adult mosquito density, which theoretically shows a higher correlation with dengue risk than larval indices because only the adult life stage directly contributes to dengue transmission dynamics was also in poor association with dengue incidence.
Our study provides deep insights into the dengue risk in Guangzhou through the characterization of the temporal autocorrelation structure. The results of this study are consistent with previous research showing that the number of imported cases is highly correlated with local transmission in Guangzhou [9,10]. The methodology used in the study is an effective complement for understanding dengue risk factors in the region of Guangzhou and can be applied in high-risk areas of south China. As the imported cases are highly associated with local transmission, port inspection and quarantine may be able to play a critical role in dengue control in the region. The quarantine mainly targets fever cases (⩾38°C) entering Guangzhou by plane, with the serum collected and polymerase chain reaction laboratory test followed. Approximately 30% of the total imported cases were identified by port quarantine in Guangzhou. When imported cases escaped detection by port monitoring systems, the rapid rate of case detection in local health institutions is also important in control practices, which can be improved by training to increase knowledge, attitudes and practices of both health workers and the public [43]. When the minimum temperature arises, the vector control efforts should be strengthened, especially in the area with a high likelihood of imported cases. Due to Guangzhou as the imported dengue area, the strategy under this study is an economic and effective way to prevent dengue local outbreak.
Admittedly, there were limitations in our study. The ARIMAX model can only identify a correlation between local dengue incidence and the risk variables but cannot identify causal relationships [27]. Furthermore, our study did not take into account the effects of human movement [14], water supply [13], land  utilisation [44] and socio-economic factors [45]. These factors also play certain roles in determining dengue transmission in the study regions. Due to the experiences and lessons from 2014, Guangzhou authorities took strict intervention measures, especially in hospital isolation and early diagnosis using the NS1 antigen test [46], to successfully reduce the number of local cases during the 2015 transmission season. Finally, due to the lag 0 for both factors including in the models, it is reasonable for fitting the model conforming to the actual public health practices. Using time interval with monthly data, when an imported case is introduced, it will cost longest 20 days roundly to disperse, including an incubation period about 3-15 day and longest infectious period 5 days after onset which is less than 1 month. Therefore, the introduced imported case in a current 1 month is more important. Furthermore, it is a bit delayed to use the onset date for monthly aggregation analysis instead of the infection date or biting date. However, it would be more objective if the accurate date could be confirmed. The frequency and scale of dengue transmission in China have increased in recent years [8]. A better understanding of the relationship between dengue occurrence and various risk factors can be used to prevent further large outbreaks. Our ARIMAX modelling provides a practical approach for monitoring dengue in Guangzhou City. The model and its outcome can give objective guidance for local health authorities in order to better understand future large epidemics.