Large-scale Lassa fever outbreaks in Nigeria: quantifying the association between disease reproduction number and local rainfall

Lassa fever (LF) is increasingly recognised as an important rodent-borne viral haemorrhagic fever presenting a severe public health threat to sub-Saharan West Africa. In 2017–18, LF caused an unprecedented epidemic in Nigeria and the situation was worsening in 2018–19. This work aims to study the epidemiological features of epidemics in different Nigerian regions and quantify the association between reproduction number (R) and state rainfall. We quantify the infectivity of LF by the reproduction numbers estimated from four different growth models: the Richards, three-parameter logistic, Gompertz and Weibull growth models. LF surveillance data are used to fit the growth models and estimate the Rs and epidemic turning points (τ) in different regions at different time periods. Cochran's Q test is further applied to test the spatial heterogeneity of the LF epidemics. A linear random-effect regression model is adopted to quantify the association between R and state rainfall with various lag terms. Our estimated Rs for 2017–18 (1.33 with 95% CI 1.29–1.37) was significantly higher than those for 2016–17 (1.23 with 95% CI: (1.22, 1.24)) and 2018–19 (ranged from 1.08 to 1.36). We report spatial heterogeneity in the Rs for epidemics in different Nigerian regions. We find that a one-unit (mm) increase in average monthly rainfall over the past 7 months could cause a 0.62% (95% CI 0.20%–1.05%)) rise in R. There is significant spatial heterogeneity in the LF epidemics in different Nigerian regions. We report clear evidence of rainfall impacts on LF epidemics in Nigeria and quantify the impact.


Introduction
Lassa fever (LF), caused by Lassa virus (LASV), is increasingly recognised as an important rodent-borne viral haemorrhagic fever presenting a severe public health threat to some of the communities in sub-Saharan West Africa [1]. Discovered in 1969 [2], LF is endemic to much of rural Nigeria and regions in the Mano River Union [3]. LASV transmits from human to human, as well as via the zoonotic cycle [1,3,4]. LF has a high case fatality rate ranging from 1% in the community to over 60% in hospital settings [1,4,5]. The common reservoir of LASV is Mastomys natalensis, one of the most widespread rodent species in sub-Saharan Africa [1,3], which exhibits sensitive population dynamics to the water level, e.g. rainfall, flooded agricultural activities [6,7]. Previous studies have recognised the ecological association between the population levels of rodents and rainfall [8][9][10].
LF epidemics typically start in November and last until May of the following year, with the majority of cases occurring in the first quarter of the following year, in addition to sporadic cases reported throughout the year. The 2017-18 epidemic in Nigeria was an unprecedented LF epidemic in the country's history [11], which resulted in 400 confirmed cases, including 97 deaths, between January and March 2018 [12]. The most recent epidemic in Nigeria has already caused 526 confirmed cases from January to March of 2019, which included 121 deaths [12]. The five states of Edo, Ondo, Ebonyi, Bauchi and Plateau are the only states that have been among the top 10 hit hardest states in terms of number of LF cases in both the 2018 (85.5% of total national cases) and 2019 (85.7% of total national cases) epidemics. While there have been discussions about the association of rainfall level and LF incidence rate [13,14], this association has not yet been demonstrated and quantified. This work aims to study the epidemiological features of epidemics in different Nigerian regions between January 2016 and March 2019. We estimate LF infectivity in terms of the reproduction number (R) and quantify the association between R and state rainfall. We explore the spatial heterogeneity of the LF epidemics and summarise the overall findings with model-average estimates.

Data
Weekly LF surveillance data are obtained from the Nigeria Centre for Disease Control (NCDC), where the data are publicly available from the weekly situation reports released by NCDC [12]. Laboratory-confirmed case time series are used for analysis. We examine the major epidemics that occurred between January 2016 and March 2019 across the whole country and the aforementioned five states that were among the top 10 hardest-hit states in both the 2018 and 2019 epidemics, i.e. Edo, Ondo, Ebonyi, Bauchi and Plateau. The state rainfall records of each state were collected on monthly average basis from the historical records of the World Weather Online website [15]. Figure 1(a) and (b) shows the rainfall time series of the five states and the weekly reported LF cases across the entire Nigeria.

Intuitive coincidence between rainfall and epidemic
To test the credibility of the coincidence between rainfall and LF epidemic, we use a simple statistical regression model of 'case ∼ exp(α × rainfall) + θ', where α and θ are free parameters to be estimated. The 'rainfall' in the model represents the state rainfall time series with lag of 4-9 months. This lag term corresponds to the time interval between the rainfall and the development of rodent population [7]. We check the least-square fitting outcomes of these regression models and select the model of lagged rainfall with the highest goodness-of-fit. The fitting significance is treated as the initiation of the quantitative association between state rainfall and the LF epidemic.

Modelling and estimation
Four different nonlinear growth models are adopted to pinpoint the epidemiological features of each epidemic. The models are the Richards, three-parameter logistic, Gompertz and Weibull growth models. These simple structured models are widely used to study S-shaped cumulative growth processes; e.g. the curve of a single-wave epidemic and have been extensively studied in previous work [16,17]. These models consider cumulative cases with saturation in the growth rate to reflect the progression of an epidemic due to reduction in susceptible pools or a decrease in the exposure to infectious rodent populations. The extrinsic growth rate increases to a maximum (i.e. saturation) before steadily declining to zero. The modelling and fitting via the growth models of the epidemic curve are illustrated in Figure 2.
We fit all models to the weekly reported LF cases in different regions and evaluate the fitting performance by the Akaike information criterion (AIC). We adopt the standard nonlinear least squares (NLS) approach for model fitting and parameter estimation, following [16,18]. A P-value <0.05 is regarded as statistically significant and the 95% confidence intervals (CIs) are estimated for all unknown parameters. As we are using the cumulative number of the LF cases to conduct the model fitting, some fitting issues might occur, as per the studies in King et al. [19], due to the non-decreasing nature in the cumulative summation time series. The models are selected by comparing the AIC to that of the baseline (or null) model. Only the models with an AIC lower than the AIC of the baseline model are considered for further analysis. Importantly, the baseline model adopted is expected to capture the trends of the time series. Since the epidemic curves of an infectious disease commonly exhibit autocorrelations [20], we use autoregression (AR) models with a degree of 2, i.e. AR(2), as the baseline models for growth model selection. We also adopt the coefficient of determination (R-squared) and the coefficient of partial determination (partial R-squared) to evaluate goodness-of-fit and fitting improvement, respectively. For the calculation of partial R-squared, the AR(2) model is used as the baseline model. The growth models with a positive partial R-squared (indicating fitting improvement) against the baseline AR(2) model will be selected for further analyses.
After the selection of models, we estimate the epidemiological features (parameters) of turning point (τ) and reproduction number (R) via the selected models. The turning point is defined as the time point of a sign change in the rate of case accumulation, i.e. from increasing to decreasing or vice versa [16,18]. The reproduction number, R, is the average number of secondary human cases caused by one primary human case via the 'human-to-rodent-to-human' transmission path [18,21]. When the population is totally (i.e. 100%) susceptible, the R will equate to the basic reproduction number, commonly denoted as R 0 [21,22]. The reproduction number (R) is given in Eqn (1), Here, γ is the intrinsic per capita growth rate from the nonlinear growth models and κ is the serial interval of the LASV infection. The serial interval (i.e. the generation interval) is the time between the infections of two successive cases in a chain of transmission [21,[23][24][25]. The function h(·) represents the probability distribution of κ. Hence, the function M(·) is the Laplace transform of h(·) and specifically, M(·) is the moment generating function (MGF) of a probability distribution [21]. According to previous work [26], we assume h(κ) to follow a Gamma distribution with a mean of 7.8 days and a standard deviation (SD) of 10.7 days. Therefore, R can be estimated with the values of γ from the fitted models [18,21,27,28]. The state Rs were estimated from the γs of the fitted epidemic growth curves of each state. Similarly, the national Rs are estimated from the γs of the epidemic growth curves fitted to the national number of cases time series in different epidemic periods. We then summarise the κ and R estimates via the AIC-weighted model averaging. The AIC weights, w, of the selected models (with positive partial R-squared) are defined in Eqn (2), Here, AIC i is the AIC of the i-th selected model and the AIC min is the lowest AIC among all selected models. Thus, the i-th selected model has a weight of w i . The model-averaged estimator is the weighted average of the estimates in each selected model, which has been well studied in previous work [16,29].
For the AIC-based model average of the R, there could be the situation that no growth model is selected according to the partial R-squared. In such cases, instead of the model average, we report the range of the R estimated from all growth models.
Testing the spatial heterogeneity of the LF epidemics After finding the model-averaged estimates, we apply Cochran's Q test to examine the spatial heterogeneity of the epidemics in different regions over the same period of time [30]. For instance, we treat the model-averaged R estimates as the univariate meta-analytical response against different Nigerian regions (states) and further check the heterogeneity by estimating the significance levels of the Q statistics. A P-value <0.05 is regarded as statistically significant.

Association between rainfall and reproduction number
Similar to the approach in the previous study [31], the association between the state rainfall level and LASV transmissibility are modelled by a linear mixed-effect regression (LMER) model in Eqn (3), Here, E(·) represents the expectation function and j is the region index corresponding to different regions (states). Term c j is the interception term of the j-th region to be estimated and it is variable from different regions, serving as the baseline scale of transmissibility in different states. The term t denotes the cumulative lag in the model and 〈rainfall j,t 〉 represents the average monthly rainfall of the previous t months from the turning point, τ, of the j-th region. The range of lag term, t, will be considered from 4 to 9 months, which is explained by the time interval between the peak of the rainfall and the peak of rodent population [7]. As illustrated in Figure 3, the reproduction numbers, R j s, are estimated for different epidemics from the selected growth models. The regression coefficient, β, is to be estimated. Hence, the term (e β -1) × 100% is the percentage changing rate (of R), which can be interpreted as the percentage change in transmissibility due to a one-unit (mm) increase in the average of the monthly rainfall level over the past 7 months. The framework of the regression is based on the exponential form of the predictor to model the expectation of transmissibility (e.g. R); this framework is inspired by previous work [32][33][34][35]. To quantify the impacts of state rainfall, we calculate the percentage changing rate with different cumulative lags (t) from 4 to 9 months and estimate their significant levels. Only the lag terms (t) with significant estimates are presented in this work. We present the analysis procedure in a flow diagram in Figure 3. All analyses are conducted by using R (version 3.4.3 [36]) and the R function 'nls' is employed for the NLS estimation of model parameters.

Results and discussion
The rainfall time series of the five states and the weekly reported LF cases of the whole of Nigeria are shown in Figure 1(a) and (b). We observe that the major LF epidemics usually occur in Nigeria between November and May of the following year. The cumulative lagged effects were observed via matching the peak timing of the rainfall and epidemic curves. In Figure 1(c), we shift the rainfall time series of the five states by + 6 months to match the trends of the national LF epidemic curve in Nigeria. In Figure 1(d) and (e), we find that the fit has a P-value <0.0001, which indicates a statistically significant association between the LF cases and shifted rainfall curve.
We fit four different growth models to the LF confirmed cases and estimate the model-average reproduction number (R) after model selection. We show the growth model fitting results in Figure 4 and the model estimation and selection results in Table 1. Most of the models have positive partial R-squared against the baseline AR(2) model. Most of the regions exhibit an epidemic turning point (τ) ranging from the epidemiological week (EW) 4-10, i.e. from the end of January to mid-March, in each year. Out of four epidemics in the states of Bauchi and Plateau, there are three estimated τs after EW 10 ( Table 1)   Many previous studies adopted the instantaneous reproduction number, commonly denoted by R t , which can be estimated by a renewable equation, to quantify the transmissibility of infectious diseases [21,23,24,37,38]. The factors that affect the changing dynamics of R t include (i) the depletion of the susceptible population [32] or decrease in the exposure to infectious sources, (ii) the change, usually it is the improvement, in the unmeasurable disease control efforts, e.g. contract tracing, travel restriction, school closure, etc., [39][40][41][42] and local awareness of the epidemic [33], and (iii) the natural features of the pathogen, e.g. its original infectivity and other interepidemic factors [32,33,35].
In this work, we choose to use the average reproduction number (R) rather than R t , as the measurement of the LASV transmissibility. The estimated R summarises the LASV transmissibility over the whole period of an epidemic. The reasons why we prefer R rather than R t are as follows. First, the temporal changes of the susceptible population or decrease in the exposure to infectious sources are removed from the R estimates due to the nature of the growth models. Second, the changes of the susceptible population and/or disease awareness or control measures and the effect of the rainfall cannot be disentangled in the time-varying reproduction number, R t , the average reproduction number (R) adopted is a better proxy to explore the association between LF infectivity and rainfall. With respect to point (iii) and other heterogeneities of epidemics in different regions, we account for this issue by including the 'region' dummy variables in the LMER model in Eqn (3). These dummy variables serve as random effects to offset the regional heterogeneities of LF epidemics. Therefore, we can then quantify a general effect, i.e. the β in Eqn (3), of the lagged rainfall on the LASV R estimate among different Nigerian places.
The association between total rainfall in a state and the LASV transmissibility (R) is modelled and quantified by the LMER model. In Figure 5, we find a positive relation between rainfall and R. The estimated changing rate in R under a one-unit (mm) increase in the average monthly rainfall is summarised with different cumulative lag terms from 4 to 9 months (the t in Eqn (3)). The range of lag in the rainfall from 4 to 9 months had previously been explained by the time interval between the peak of the rainfall and the peak of the rodent population [7]. The estimates of the rainfall-associated changing rate in R with different lag terms were summarised in Table 2. We report the most significant (i.e. with the lowest P-value) regression estimates that appear with a cumulative lag of 7 months. The habitats of the LASV reservoir, i.e. rodents, include irrigated and flooded agricultural lands that are commonly found in and around African villages [6]. The 7-month lag also coincides with the period between the dry and rainy seasons [43]. The association between rodent population dynamics and rainfall levels has been demonstrated in a number of previous studies [6][7][8][9][10]. Hence, we consider the 7-month lagged estimation as our main results. Namely, a one-unit (mm) increase in the average monthly rainfall over the past 7 months is likely to cause a 0.62% (95% CI 0.20%-1.05%) rise in the R of the LF epidemic. We also remark that this 'one-unit (mm) increase in the average monthly rainfall over the past 7 months' is equivalent to '7-unit (mm) increase in the total rainfall over the past 7 months'. The present finding of the impact of lagged rainfall on LF epidemics suggests that the knowledge of such weather-driven epidemics could be gained by referring to past rainfall levels. For instance, if a relatively high amount of rainfall occurs, local measures, such as rodent population control, could be effective to reduce   the LF risk. This speculation could also be verified by examining the rodent population data of the Nigerian regions included in this work. The findings in this work are of public health interest and are helpful for policymakers in LF prevention and control. On the one hand, our findings suggest the existence of an association between rainfall and LASV transmissibility, which could be affected by the population dynamics of rodents [13]. On the other hand, the positive relation between rainfall and R indicates that rainfall, particularly in states with a high LF risk, can be translated as a warning signal for LF epidemics. The modelling framework in this study should be easily extended to other infectious diseases.

Limitation
Our work contains limitations. As in some African countries, the weather data are available only from a limited number of observatory stations and thus it is not sufficient to capture more accurate spatial variability. In this work, instead of exploring the spatial differences in the associations between rainfall and LF epidemic, we relaxed the setting and studied a general relationship. We qualified the general rainfall-associated changing rate of R in Nigeria. For the transmissibility estimation, our growth modelling framework can provide the estimates of R, but not the basic reproduction number commonly denoted as R 0 . However, according to the theoretical epidemiology [22,27,35,44,45], the R 0 can be determined by R 0 = R/S, where S denotes the population susceptibility. Although S is not involved in our modelling framework, the information of S could be acquired from local serological  surveillances. The existing literature reported 21.3% seroprevalence among Nigerian humans by the enzyme-linked immunosorbent assay (ELISA) [46]. Hence, the R 0 can be calculated as 1.63 by using S = 1-21.3% = 0.787 and the R = 1.28 as the average of the 2016-18 LF epidemics. This was a data-driven modelling study, and we quantified the effect of rainfall as a weather-driven force of R based on previous ecological and epidemiological evidences [7,43]. Since the transmission of LASV mainly relies on the rodent population, the factors including seasonality, agricultural land-using, subtropical or tropical forest coverage that could impact rodent ecology should be relevant and helpful in the analysis. However, due to availability of data, the agricultural land-using factors, e.g. pastureland, irrigated land, flooded agricultural land usage and forest coverage were absent in our analysis, which should be studied in the future if they become available.

Conclusions
The LF epidemic reproduction numbers (R) of the whole of Nigeria in 2017-18 (R = 1.33 with 95% CI 1.29-1.37) and 2018-19 (R ranged from 1.08 to 1.36) are significantly higher than in 2016-17 (R = 1.23 with 95% CI 1.22-1.24). There is significant spatial heterogeneity in the LF epidemics of different Nigerian regions. We report clear evidence of rainfall impacts on LF epidemics in Nigeria and quantify this impact. A one-unit (mm) increase in the average monthly rainfall over the past 7 months could cause a 0.62% (95% CI 0.20%-1.05%) rise in the R. The state rainfall information has potential to be utilised as a warning signal for LF epidemics.
Data. All data used for analysis are freely available via online public domains [12,15].