Socio-economic factors associated with the incidence of Shiga-toxin producing Escherichia coli (STEC) enteritis and cryptosporidiosis in the Republic of Ireland, 2008–2017

The Republic of Ireland (ROI) currently reports the highest incidence rates of Shiga-toxin producing Escherichia coli (STEC) enteritis and cryptosporidiosis in Europe, with the spatial distribution of both infections exhibiting a clear urban/rural divide. To date, no investigation of the role of socio-demographic profile on the incidence of either infection in the ROI has been undertaken. The current study employed bivariate analyses and Random Forest classification to identify associations between individual components of a national deprivation index and spatially aggregated cases of STEC enteritis and cryptosporidiosis. Classification accuracies ranged from 78.2% (STEC, urban) to 90.6% (cryptosporidiosis, rural). STEC incidence was (negatively) associated with a mean number of persons per room and percentage of local authority housing in both urban and rural areas, addition to lower levels of education in rural areas, while lower unemployment rates were associated with both infections, irrespective of settlement type. Lower levels of third-level education were associated with cryptosporidiosis in rural areas only. This study highlights settlement-specific disparities with respect to education, unemployment and household composition, associated with the incidence of enteric infection. Study findings may be employed for improved risk communication and surveillance to safeguard public health across socio-demographic profiles.


Introduction
Shiga-toxin producing Escherichia coli (STEC) is an enteric pathogen causing gastroenteritis with symptoms ranging from diarrhoea and haemorrhagic colitis to haemolytic uraemic syndrome, and death in severe cases [1,2]. Cryptosporidium spp. are protozoan parasites characterised by self-limiting diarrhoeal disease in immunocompetent people, but with potentially severe clinical symptoms among immunocompromised individuals, young children and older adults [3,4]. Over the past decade, crude incidence rates (CIRs) of STEC enteritis and cryptosporidiosis in Ireland have been among the highest in Europe, with CIRs of 20.0 (STEC) and 13.2 (cryptosporidiosis) cases per 100 000 population, respectively, notified during 2018 [5,6].
The spatial distribution of both infections exhibits a strong urban-rural partition in Ireland, with the incidence rates of both infections significantly higher in rural areas [7][8][9]. Dominant transmission pathways include direct contact with farm animals, consumption from private (ground)water sources and high reliance on (poorly managed) private domestic wastewater treatment systems (i.e. septic tanks); all factors typically associated with rurality [8][9][10]. International studies have reported associations between infection incidence and a broad range of social determinants, including measured deprivation [11][12][13]. For example, the incidence of food-and waterborne infections in urban areas is frequently associated with poor sanitation infrastructure, synonymous with economically deprived areas and person-to-person transmission (i.e. elevated population densities) [11][12][13]. Conversely, infectious diseases in rural areas have been attributed to barriers accessing healthcare [14] and poor water resource infrastructure [15].
While the environmental determinants of STEC enteritis and cryptosporidiosis in Ireland are relatively well understood (e.g. private well use, high vulnerability hydrogeological settings, high livestock densities, wastewater infrastructure) [8,15], to date, the socio-economic drivers of infection have not been comprehensively explored. Accordingly, the current study sought to examine the presence of associations between individual components of the Pobal HP deprivation index [16] with spatially aggregated cases of STEC enteritis and cryptosporidiosis in Ireland from 2008 to 2017, with analyses delineated by urban/rural classification due to the aforementioned divide associated with both infections in Ireland.

Infection data
Laboratory-confirmed, irriversibly anonymised case data were obtained from the Computerised Infectious Disease Reporting (CIDR) database (http://www.hpsc.ie/CIDR/), an information system used for collating notifiable (communicable) infection data in Ireland [17]. Primary and index cases of STEC enteritis were included for analyses, with primary and index cases defined as sporadic (i.e. not associated with a confirmed outbreak or cluster) or the first case identified as part of a recognised outbreak/cluster, respectively [8]. Confirmed STEC cases occurring between 1 January 2013 and 31 December 2017 were included for analyses, with STEC cases notified prior to 2013 excluded to ensure consistency regarding laboratory confirmation protocols.
Data for notified primary and secondary cases of cryptosporidiosis were obtained from 1 January 2008 to 31 December 2017, with a secondary infection defined as a laboratoryconfirmed case with an epidemiological link to another notified case [18]. Cryptosporidiosis cases prior to 1 January 2008 were excluded to avoid biases associated with a large cryptosporidiosis outbreak which occurred in the west of Ireland during spring 2007 [19]. All individual cases of infection were geo-referenced to one of 18 488 Central Statistics Office (CSO) Small Area (SA) centroids, using a previously developed protocol [20]. SAs comprise 80-120 dwellings, geographically defined for population census reporting, and are the smallest spatially referenced unit in Ireland [21]. A binary variable was developed to indicate the presence/absence of a confirmed infection within all individual SAs over the study period.

Pobal HP deprivation index data
The Pobal Haase Pratschke (HP) deprivation index comprises 16 individual components representing the three main dimensions of deprivation, as measured in the Republic of Ireland (ROI); demographic profile, social class composition and labour market situation (Table 1; Fig. 1) [16]. Absolute and relative index scores represent composite measures of deprivation based on these components, calculated for each SA and measured on a single scale across all census periods (Supplementary Table S1). The absolute deprivation score reflects any changes to the national economy at SA level between census periods while the relative deprivation index score is a comparative measure of deprivation between SAs during a census period [16]. Deprivation index component data were obtained for two waves of the national census of Ireland (2011 and 2016) to correspond with the study period.

Urban/rural classification
A binary SA-specific settlement type variable was developed using data obtained from the Irish CSO ( Fig. 1b Table S2). The CSO settlement type dataset comprises 14 variables classified along an urban/peri-urban/rural scale ranging from 'city' (1) to 'rural' (14). The urban/rural variable was coded such that any classification which included a built-up area (classification 1-13) was recoded as 'urban', with remaining areas (classification 14) coded as 'rural'. All datasets were spatially integrated using a unique SA identifier in Stata V14 (StataCorp, College Station, Texas, USA).

Data analysis
Bivariate analysis Exploratory analyses of associations between individual deprivation components and the incidence of both infections in rural and urban areas were conducted via sequential bivariate logistic regressions. Statistically significant associations were assessed using a confidence level of 95% (P ⩽ 0.05), with explanatory variables for multivariate classification selected based on Bayesian Inference Criterion (BIC). The presence of collinearity between explanatory variables was diagnosed using the Variance Inflation Factor (VIF) and Spearman's rank (non-parametric) correlation coefficient (ρ). A VIF of ⩾5 and a Spearman's ρ of ⩾0.5 were considered indicative of potentially significant collinearity between individual deprivation components.

Random Forest classification
A suite of classification algorithms (support vector machines, Naïve Bayes, optimised generalised linear models and CART (decision trees)) were tested, with the Random Forest (RF) algorithm found to exhibit the highest classification performance. Calibrated parameters were selected using hyperparameter tuning and validated via an optimised iterative Breiman's RF parameter estimation algorithm in the randomForest package (R Foundation for Statistical Computing, Vienna, Austria). Parameters for STEC and cryptosporidiosis models, obtained from hyperparameter tuning, were set at a number of randomly sampled variables at each split (mtry = 3-4). RF models were calibrated using partitioned training datasets (70% of data) and 're-trained' using full datasets (100% of data) to ensure classification stability (i.e. change in variable importance between datasets). Attribute importance was

Epidemiology and Infection
assessed via the mean Gini value decrease, with larger values indicative of a greater contribution to infection classification.
The occurrence of cryptosporidiosis exhibited a negative association with household density (0.52 vs. 0.55; P < 0.001) in urban areas only, while a lower lone parent ratio (12.8 vs. 14.5; P < 0.001) and lower proportion of semi-skilled or unskilled workers (17.7 vs. 18.2; P = 0.008) were associated with infection irrespective of settlement type. Cryptosporidiosis was associated with lower proportions of residents with primary education only (13.6 vs. 14.5; P = 0.008) in urban areas only, whereas incidence across all settlement types was associated with a lower proportion of residents with third-level education (36.9% vs. 38.5%). Lower male (11.8 vs. 12.9; P < 0.001) and female unemployment rates (9.8 vs. 10.5; P < 0.001) were correlated with cryptosporidiosis in rural areas only.   Epidemiology and Infection 2011) (Fig. 3). A lower mean number of persons per room was associated with the presence of STEC enteritis within both urban and rural SAs, albeit this was the lowest relative contributor in all three models in which it appeared (Urban 2011, 2016; Rural 2016).

STEC enteritis
In urban SAs, higher age dependency rates (highest MDG = 681.1, Urban 2011) and lower percentages of local authority housing (highest MDG = 562.7, Urban 2011) were significant contributors to STEC classification, with neither of these attributes retained in

Cryptosporidiosis
Classification accuracy for cryptosporidiosis ranged from 82.0% (Urban 2016) to 90.6% (Rural 2011) (Table S6). Higher total SA populations (highest MDG = 574.9, Rural 2016) and age dependency rates (highest MDG = 550.9, Rural 2016) were significant attributes within all four classification models (Fig. 2). A lower household density was also significant within both rural and urban areas, although this attribute exhibited the lowest overall relative contribution to infection classification (highest MDG = 114.4, Urban 2016). Across both settlement types, rates of unemployment were significantly associated with the presence of cryptosporidiosis, with associations again alternating based on gender-specific rates and time period; female unemployment was significant within the 2016 Rural (MDG = 519.5) and Urban RF models (MDG = 369.7), while conversely, male unemployment was significant within both rural models

Discussion
The present study represents the first comprehensive investigation of individual socio-economic index components, as they relate to the incidence of two enteric infections in the ROI. Prior to the interpretation of study findings, the authors feel it is important to highlight some inherent limitations of the current study. As with any ecological analyses employing spatial areas as the primary study unit, it is crucial that caution is employed when interpreting study findings or replicating this approach. The Pobal HP deprivation index was designed and is employed as a measure of national-scale socio-economic deprivation, and as such, was not specifically developed for elucidating the transmission of or exposure to enteric pathogens and/or public health risk. Moreover, as both deprivation indices and infection data were aggregated to spatial units, examination of individual-level associations of infection with socio-economic elements is not possible, nor should study findings be interpolated to this level, i.e. findings are susceptible to ecological fallacy whereby affluent individuals (and cases of infection) may reside within a socio-economically deprived area, and vice versa. Comparison of findings for the two infections should account for differential management of secondary cases; while secondary cases were excluded from the STEC dataset, all cluster-related cases were included in the cryptosporidiosis dataset. This approach was taken partially to minimise spatiotemporal bias, and partially to account for differences in transmission. A significant majority of STEC clusters were consistently identified as such in the CIDR database across all geographical areas and throughout the data collection period, thus it was possible to consistently exclude secondary cases. Conversely, cryptosporidiosis clusters were not all identified in the earlier years of the 10-year dataset, and so secondary cases could not be excluded without introducing a spatiotemporal bias. Additionally, the route of transmission for STEC clusters was almost exclusively person-to-person, thus secondary cases contribute little to the statistical power of presented analyses. In contrast, the suspected route of transmission for cryptosporidiosis outbreaks was typically waterborne or animal contact, thus statistical source attribution is justified. The use of binary classification (i.e. RF) as opposed to a zero-inflated count (e.g. Poisson) model should minimise any potential effect this differential management of secondary infections had on presented findings.
Bivariate analyses reveal some similarities between several index components and the incidence of both STEC enteritis and cryptosporidiosis, which is likely reflective of inherent similarities between the two pathogens (e.g. sources, transmission pathways, etc.). Higher total population number was associated with both organisms in both urban and rural areas and across both census time periods. Higher local populations (and thus density) are often associated with economic deprivation, particularly in urban areas; previous studies have shown that the transmission of diarrhoeal diseases is typically higher among populations in more socially deprived urban areas [22]. Moreover, settlement patterns in Ireland are characterised by a burgeoning rural 'commuter belt', whereby rural populations exhibit a preference for living in the suburban periphery of large rural towns or urban connurbations due to high urban house prices and/or proximity to workplaces and schools [8]. Accordingly, 'peri-urban' populations, while characteristically rural with respect to local landuse and infrastructure, may be more susceptible to infection via secondary (person-to-person) transmission through increased contact with urban residents via childcare facilities and occupational exposures, in addition to being reliant on characteristically 'rural' infrastructure (e.g. private wells, septic tanks).
A higher age dependency rate was also associated with the occurrence of both infections. In Ireland, children under 5 years of age represent a high-risk group for STEC infection, with cryptosporidiosis also a leading cause of paediatric gastrointestinal morbidity [23]. The association between infection incidence and age dependency rate mirrors the demographic breakdown of cases, with young children (<5 years of age) and older adults (>65 years of age) historically characterised by the highest rates of infection [9,17]. Young children may be at increased risk due to frequent contact with other young children, lower standards of hygiene and inherent immunological susceptibility to infection [24], with onward transmission of infection to older members of households also attributed to higher rates of infection among children [25].
The incidence of STEC enteritis in rural areas was significantly associated with lower levels of third-level education, likely reflecting the prevalence of agriculturally based employment in rural Ireland, which is typically generational and not contingent on higher education. Notably, significant associations were also identified between the presence of both infections and lower rates of male and female unemployment in rural areas only. Lower levels of poverty, in concurrence with higher mean income, have been associated with STEC incidence in the USA and Japan [26,27], although this association has not been found across all regions (e.g. Finland [28], Canada [29]). The interplay between affluence and infection (notification) may be indicative of healthcareseeking behaviours among higher income sub-populations as suggested by Lal et al. [30] who report that increasingly deprived areas were inversely associated with the incidence of cryptosporidiosis in New Zealand. Further support is provided by a previous study of waterborne cryptosporidiosis in Ireland which reported that paediatric cases from agricultural backgrounds were less likely to have unemployed parents than a control group (children with gastroenteritis not associated with Cryptosporidium) [31].
Conversely, higher rates of female and male unemployment were associated with the presence of cryptosporidiosis in urban areas, suggesting that the relationship between affluence and cryptosporidiosis is not nationally uniform.
Lower mean household density was associated with the incidence of both infections in urban areas; this finding was unexpected based on previous studies which report household size and structural profile as a proxy for secondary transmission [32,33]. Household transmission and overcrowding are important risk factors for cryptosporidiosis, albeit more frequently reported within low-and middle-income countries [34]. The association between enteric infections and urban household density identified in the current study may indicate elevated infection risks among affluent demographic groups. Likewise, a lower proportion of local authority housing was associated with STEC presence in urban areas, once again potentially reflecting the nexus between pathogenic exposure and economic affluence. Previous studies have cited international travel and food consumption patterns associated with higher socio-demographic status (e.g. higher intake of fresh fruit, vegetables and unprocessed produce) as risk factors for both STEC enteritis and cryptosporidiosis [27,28].
Education represents a key indicator of socio-economic status, and by extension, overall population health, with increased educational attainment typically associated with a lower incidence of non-communicable disease [35,36]. The current study identified significant associations between infection and areas characterised by higher proportions of (a) primary education only and (b) a third-level education; thus, higher and lower educational attainment were significantly associated with the incidence of both infections at the local level. A recent review by Newman et al. [37] reports significant, non-ordered variation between educational attainment and the incidence of STEC enteritis. In Ireland, younger cohorts are more likely to have received a third-level education than their predecessors, due to the introduction of free secondary school education in the late-1960s, thus the proportion of people with primary education only is appreciably higher among older people [38], as such, lower educational attainment may represent a potential proxy measure for immunological status. Conversely, previous work reports that higher infection risks are associated with higher levels of education [39], with higher education serving as a proxy for increased affluence, which as previously mentioned is often associated with international travel and consumption of unprocessed foods [27,28].
Overall, infection classification based on RF models using deprivation index components was generally considered moderately accurate; based on the published literature, this (unsurprisingly) suggests that the incidence (and thus mechanics) of infection within spatial units cannot be predicted solely based upon socio-economic metrics, with environmental factors, for example, likely to significantly strengthen predictive capacity, particularly in rural areas. Environmental drivers of STEC enteritis and cryptosporidiosis typically exhibit marked differences with respect to settlement type, for example, the incidence of STEC enteritis in rural areas has been associated with direct (farm) animal contact and contaminated drinking water, while contaminated food products have been reported as a significant driver in urban areas [10,39,40,41]. Notwithstanding, full (i.e. 100% datasets) RF models presented here did exhibit 50-80% 'truepositive' classification (i.e. sensitivity), thus there is little doubt that both infections are driven, to some extent, by local socioeconomic profile in Ireland. Accordingly, the authors recommend that future work focus on increasingly prospective approaches (in parallel with retrospective, data-driven studies), and employ a variety of metrics/indices for the classification of socio-economic profile. Moreover, the authors strongly recommend that future research should seek to incorporate socio-economic, environmental (including climactic) and behavioural drivers of infection to provide increasingly holistic models of infection incorporating the myriad sources, pathways and receptors potentially associated with transmission.

Conclusion
Increased local age dependency rates and total population number were associated with the occurrence of both infections, irrespective of settlement type, highlighting population density (person-to-person contact) and vulnerable sub-populations (i.e. immunological status, infection severity) as important transmission drivers. Numerous heterogeneities were identified with respect to 'place' as it relates to both settlement type and socioeconomic profile, and particularly those associated with education, unemployment and household composition. Lower rates of third-level education, unemployment and household density were associated with infections in urban areas only. Differences were also observed between socio-economic components and infection aetiology; lower proportions of lone parent households and higher proportions of semi-skilled and unskilled workers were associated with the incidence of cryptosporidiosis, but not STEC enteritis. Future work should incorporate environmental (including climactic), socio-economic and epidemiological data to increase current understanding of spatially-specific exposure to and transmission of enteric infections, thus permitting improved evidence-based surveillance, targeted intervention design and mitigation strategies, both nationally and internationally. Data availability statement. The human infection data that support the findings of this study are available from the Computerised Infectious Disease Reporting (CIDR) Committee (https://www.hpsc.ie/cidr/), and require formal permission following application by researchers for individual studies. Restrictions apply to the availability of these data, which were used under licence for this study. The urban/rural classification used to support the findings of this study is freely available from the Central Statistics Office (Ireland) (https://www.cso.ie/en/releasesandpublications/ep/p-cp3oy/cp3/urr/). The Pobal HP deprivation index data used to support the findings of this study are freely available, and may be accessed from http://trutzhaase.eu/deprivation-index/the-2016-pobal-hp-deprivation-index-for-small-areas/.