Development of prediction models for upper and lower respiratory and gastrointestinal tract infections using social network parameters in middle-aged and older persons -The Maastricht Study-

The ability to predict upper respiratory infections (URI), lower respiratory infections (LRI), and gastrointestinal tract infections (GI) in independently living older persons would greatly benefit population and individual health. Social network parameters have so far not been included in prediction models. Data were obtained from The Maastricht Study, a population-based cohort study (N = 3074, mean age (±s.d.) 59.8 ± 8.3, 48.8% women). We used multivariable logistic regression analysis to develop prediction models for self-reported symptomatic URI, LRI, and GI (past 2 months). We determined performance of the models by quantifying measures of discriminative ability and calibration. Overall, 953 individuals (31.0%) reported URI, 349 (11.4%) LRI, and 380 (12.4%) GI. The area under the curve was 64.7% (95% confidence interval (CI) 62.6–66.8%) for URI, 71.1% (95% CI 68.4–73.8) for LRI, and 64.2% (95% CI 61.3–67.1%) for GI. All models had good calibration (based on visual inspection of calibration plot, and Hosmer–Lemeshow goodness-of-fit test). Social network parameters were strong predictors for URI, LRI, and GI. Using social network parameters in prediction models for URI, LRI, and GI seems highly promising. Such parameters may be used as potential determinants that can be addressed in a practical intervention in older persons, or in a predictive tool to compute an individual's probability of infections.


Introduction
Over the last decade, population ageing has become a global issue [1]. Worldwide, the proportion of people aged 60 and over is growing rapidly, and it is expected to rise to one-quarter of the populations in Europe and North America in 2020 [1,2].
Infectious diseases are a major challenge in healthcare of the older persons [3], due to increased susceptibility to infections caused by an age-related compromised immune system [4]. Older persons have decreased cell-mediated immunity and decreased antibody production to new antigens [3,5]. Pneumonia and influenza are among the 10 major causes of death in the older persons [3]. The incidence and severity of community-acquired upper respiratory tract infections (URI), lower respiratory tract infections (LRI), and gastrointestinal tract infections (GI) are often higher than in other age groups.
To date, we lack evidence on non-pharmaceutical interventions to prevent infections in older persons living at home. Current EU policy expects promotion of active ageing and solidarity between generations, guiding principles include participation in society and support for informal caregivers. Hence older persons are expected to take care of themselves as much as possible with the help of their social network including informal carers [6]. A new prevention strategy may fit this policy. Therefore, we focus on the possible contributions of personal social contact networks for improving prevention strategies. Transmission and acquisition of an infectious disease are for a large part determined by social networks [7][8][9][10][11], as social relationships may act as a vehicle for the transmission of infections. Diverse and large social networks are associated with close contact with a broad range of people and hence an increased risk of exposure to a range of infectious agents [7]. This increases risk of acquiring disease, particularly among more vulnerable people, whose resistance is compromised (e.g. older persons or people with comorbidities) [7]. Social networks on the other hand are shown to have a strong influence on a person's health, well-being, and self-management, and are thought to be a promising target for effective infection prevention interventions [12]. It has been shown that social networks can act as a buffer for infections by increasing immune function [8,9]. Especially for older persons and persons with chronic disease, social networks can provide the necessary support to enable them to live independently. As such, higher levels of social support are found to have a positive association with better self-management behaviours of chronically ill [13]. Foremost, social networks and their characteristics are highly useful in novel interventions as networks can be managed by an individual older person and by their formal and informal caregivers who are all part of the same network. Most social network interventions use social networks to accelerate behaviour change or improve organizational performance, knowing different strategies and multiple tactical alternatives [14,15]. For example, peer-based interventions were shown to have positive effects on physical activity, smoking cessation, and condom use [15]. However, by our knowledge, neither prediction models with individual risk assessment, nor the specific social network parameters of personal social contact networks as determinants have been used in social network interventions so far.
Yet, to date, it is not fully understood which social network parameters are related to the risk of infections, whether these relations differ by type of infection, and whether these parameters can be used to predict an individual's probability of an infection. More insight into these issues is needed for the development of effective infection prevention programmes. Prediction models are useful for providing such insight, and they make individual risk assessment possible. However, previous attempts to develop prediction models for individual incidences of respiratory tract infections (RI) or GI, based on demographic, environmental, and lifestyle information showed only poor to moderate predicting power [16]. To the best of our knowledge, the role of social networks in predicting infectious diseases in middle-aged and older persons has not yet been studied using prediction models. Therefore, the aim of the current study was to develop and internally validate prediction models for URI, LRI, and GI in a large group of independently living middle-aged and older persons based on a range of variables including social network parameters. We hypothesize that the application of such prediction models can help in deciding about concrete infection prevention strategies for patient self-management, personalized healthcare, and home care. A better choice of prevention strategies might contribute to lowering the infectious burden and its associated risks in the growing group of older persons.

Study population
We used data from The Maastricht Study, an observational prospective population-based cohort study. The rationale and methodology have been described previously [17]. In brief, the study focuses on the aetiology, pathophysiology, complications, and comorbidities of type 2 diabetes mellitus (T2DM) and is characterized by an extensive phenotyping approach. Eligible for participation were all individuals aged between 40 and 75 years and living in the southern part of the Netherlands. Participants were recruited through mass media campaigns and from the municipal registries and the regional Diabetes Patient Registry via mailings. Recruitment was stratified according to known T2DM status, with an oversampling of individuals with T2DM, for reasons of efficiency. The present report includes cross-sectional data from the first 3451 participants, who completed the baseline survey between November 2010 and September 2013. The study has been approved by the institutional medical ethical committee (NL31329.068.10) and the Minister of Health, Welfare and Sports of the Netherlands (Permit 131088-105234-PG). All participants gave written informed consent. In the present analysis, all participants that received the social network questionnaire (N = 3074) were included.

Measurements
Social network questionnaire Data on individual social networks were collected through an online questionnaire using a name generator method [18,19]. The assessment of the social network covered contacts (interactions between persons) within a period of 6 months. The participants received a questionnaire with seven questions on different types of contacts and were asked to name a maximum of five persons (network members) per question. Questions concerned (1) persons who advised them on problems, (2) persons who could offer them practical help if they were sick, (3) persons who provided emotional support when they were feeling unwell, (4) persons who helped them with small and larger jobs around the house, (5) persons they visited for social purposes or that they could go out with sometimes, and (6) persons with whom they could discuss important matters and, finally, (7) participants were asked to name a maximum number of 10 additional persons who were also important for them because of mutual activities. In total, participants could name a maximum number of 40 network members. After every question and for each network member named, they were asked to indicate their frequency of contact with this person over the last 6 months (daily or weekly, monthly, quarterly, and half-yearly). Moreover, the participants were asked to identify their relationship to this person (e.g. partner, sister, friend, neighbour, etc. (28 options)), how far away this person lived (walking distance, less than half an hour away by car, more than half an hour away by car, more distant), and to indicate this person's sex and actual or estimated age.
Further, participants were asked to rate two statements on a five-point Likert scale ranging from strongly agree to strongly disagree: 'most of my friends know each other' and 'my best friends know my family'. The participants were also asked whether they were a member of a club (yes/no, e.g. sports club, religious group, volunteer organization, discussion group, self-support group, Internet club, or another organization).

Parameters of the social network
The network parameters were computed from the questionnaire. A detailed definition of the network parameters is presented in Table 1. In brief, network size was defined as the total number of unique network members mentioned in the questionnaire. Total contacts per half year was defined as the sum of all contacts per half year. The percentage of network members that were of the same age, that were household members, that lived within walking distance, <½ h away by car, >½ h away by car, and the percentage of network members that were family members, friends, or acquaintances was computed. Club membership was defined as membership in, for instance, a sports club, religious group, or other organization. Density was defined as the extent to which network members know each other. Moreover, participants were asked to indicate the number of members (maximum of five) who provided informational support, emotional support, and practical support.

Outcome variables
We used self-administered questionnaires to measure the occurrence of community-acquired URI, LRI, or GI over the 2-month period before completing the questionnaire. Moreover, we recorded the season in which the reported symptomatic infections occurred. The symptoms 'runny nose' and 'sore throat' were pooled as indicators of URI. Influenza, pneumonia, and fever were pooled as indicators of LRI. Vomiting with fever and diarrhoea were pooled as GI.

Candidate predictors
In the literature, we identified several general variables and social network parameters that had previously been examined in relation to infections [3, 5, 7-12, 16, 24, 25]. Based on this extensive literature search, we included 52 variables as potential predictors, of which 26 network variables and 26 general variables. All candidate predictors are described in Tables 1-3.

Model development
Missing information on potential general predictor variables (0-26%) was imputed using stochastic regression imputation, since complete case analysis may bias results and can cause a decrease in sample size [26]. The imputations were drawn using predictive mean matching, which ensures only realistic values are imputed that are observed elsewhere in the data [27]. Information on missing values of potential general predictor variables can be found in Table 2.
Per infection, we added all potential predictor variables to a logistic regression model. We used stepwise backward elimination based on the Akaike Information Criterion for variable selection, which is a goodness-of-fit measure that penalizes the model fit for model complexity [28]. As a result, predictors included in the model do not necessarily have a P-value of 0.05 or lower. We used restricted cubic splines to test whether continuous variables were non-linearly associated to the log-odds of experiencing an infection, and tested for statistical interactions of the social network parameters with sex, age, and type 2 diabetes.
We determined the performance of each of the prediction models by quantifying measures of discriminative ability and calibration. A model's discriminative ability refers to its ability to discriminate between those who developed an infection over the course of 2 months and those who did not develop an infection, and is expressed as the AUC, which is the area under the receiver operating characteristic curve. The AUCs will be tested against the null-hypothesis that the AUC is 50%, which is no more than flipping a coin. Calibration refers to the agreement between predicted probabilities and observed probabilities. To assess calibration, we visually inspected a calibration plot and applied the Hosmer-Lemeshow (HL) goodness-of-fit test. An HL test that yields a P-value of 0.05 or lower is considered to indicate poor calibration. As we were especially interested in the prediction of infections in older persons, we computed the AUC for all models applied to persons of 60 years and older, and for persons who were younger than 60 years old.
As a sensitivity analysis for the imputation procedure, we computed the three models for the dataset of complete cases only, to judge whether the AUCs differed to any clinically relevant extent.

Model validation
It is a well-known phenomenon that prediction model performance degrades when applied to new persons who were not used to develop the model [29]. Often, predictions derived from a model are too extreme (i.e. persons at low risk are predicted too low, and vice versa). To estimate the performance of the prediction models in data involving new persons, and to counteract the too extreme predictions in the future, we performed an internal validation step. For each prediction model, we drew 1000 bootstrap samples. On each sample, model development was repeated and the performance (measured by AUC) of those bootstrap models was calculated on both the bootstrap sample as well as in the original sample. The average difference in performance between the bootstrap sample and the original sample is the estimate of the optimism in model performance. This optimism can subsequently be subtracted from the initial performance measures. In addition, the bootstrap routine yields a shrinkage factor. The original regression coefficient can be multiplied by the shrinkage factor. As the shrinkage factor has a value between 0 and 1, the regression coefficients are shrunk towards zero, and future predictions are less extreme [30]. To identify the proportion of network members who are of the same age as the participant, we calculated the difference between the participants' age and the network members' age for every network member named. Next, we computed the proportion of same age (±5 years) network members for each participant

Heterogeneity
Sex heterogeneity (IQV, range 0-1) To assess sex heterogeneity within the participants' network, we computed the Index of Qualitative Variation (IQV) [40]. This index indicates the probability that two randomly chosen network members belong to the same category. The IQV is defined as the ratio of observed differences divided by maximum differences, where '0' represents a fully homogeneous and '1' a fully heterogeneous network [40]. Observed differences were calculated through multiplication of the total number of men by the total number of women. We calculated maximum differences as (network size/2) 2 [40] Type of relationship  Tables 2 and 3. The general and social network characteristics broken down for infection status were presented in online Supplementary Tables S1 and S2. The restricted cubic spline regression did not reveal non-linear associations between continuous variables and the log-odds of experiencing any of the three types of infections, nor did we find any statistically significant interactions between sex, age, or type 2 diabetes and network parameters.     Table 4 shows the coefficients and odds ratios (ORs) of the prediction model for URI. The AUC of this model was 64.7% (95% confidence interval (CI) 62.6-66.8%). The model was based on 16 predictors, of which nine network parameters and seven general predictors. Smoking, BMI, problems with daily activities, and emotional support were positively related to URI, while age, season, total friend contacts per half year, the proportion of network members who are household members, who are living within walking distance, who are living <½ h away by car, proportion of same-age network members, proportion of network members who are family members, density between friends and family, and practical support showed an inverse relationship with URI. Table 5 shows the coefficients and ORs of the prediction model for LRI. For this model, the AUC was 71.1% (95% CI 68.4-73.8). The model was based on 14 predictors, of which five network parameters and nine general predictors. BMI, problems with daily activities, depression, the proportion of network members living >½ h away by car, the proportion of network members who are friends, and the proportion of network members who are acquaintances were positively associated with LRI, while age, high or low educational level, season, the proportion of same-age network members, and informational support were negatively associated with LRI. The AUC of the prediction model for GI was 64.2% (95% CI 61.3-67.1%) ( Table 6). The model was based on 12 predictors, of which six network parameters and six general predictors. Problems with daily activities, depression, MMSE score, type 2 diabetes, mental healthcare consumption, network size, and the proportion of network members living >½ h away showed positive associations with GI, while paramedical healthcare consumption, proportion of same-age network members, proportion of network members who are family members and acquaintances, and practical support showed an inverse association with GI. See Table 7 for a summary of the associated social network parameters.
The sensitivity analysis on only complete cases yielded AUCs that did not differ more than 1.4% (data not shown).
When the models were applied to persons of 60 years and older, and subsequently to persons younger than 60 years, the AUCs were comparable to the whole group. For upper RI this was 64.0 (95% CI 61. Online Supplementary Figure S1 shows the calibration plots for the three prediction models. All plots show good agreement between predicted probabilities of an infection, and the actual, or observed frequency of infections. Furthermore, the HL goodness-of-fit test yielded a P-value of 0.30, 0.12, and 0.25 for the models URI, LRI, and GI, respectively, verifying that the models are well calibrated.
The formula to compute an individual's probability of an infection in a period of 2 months can be found in the online Supplementary Material.

Internal validation
The internal validation step yielded a shrinkage factor for each prediction model. This shrinkage factor was used as a correction factor for the regression coefficients. Tables 3-5 show the shrunken regression coefficients and the re-estimated intercept. Using these coefficients for calculating the probability of an infection for future patients will less likely result in too extreme predictions compared with the coefficients of the initial models.
In addition to a prediction model-specific shrinkage factor, the internal validation yielded a measure of optimism in the estimation of the AUC of each model. The optimism in the AUC was 1% for URI and LRI, and 2% for GI. Hence, we expect that the discriminative ability of these models when applied to new patients will be 63.7%, 70.1%, and 62.2%, respectively.

Discussion
To the best of our knowledge, this study is the first attempt to develop prediction models for URI, LRI, and GI including social network parameters as potential predictors. This study describes the development and internal validation of three prediction models for symptomatic infections in a period of 2 months: URI, LRI, and GI. The models were able to discriminate between those who experienced an infection and those who did not, and had good  calibration. The main finding was that the social network parameters are strong independent predictors for infections in middle-aged and older persons. Moreover, most social network parameters had a beneficial association with the three infections. As such, social network parameters are likely to be highly promising concepts in future infection prevention strategies in older persons living at home. This study showed that the preventive potential of the social network parameters is twofold.
Combined with other factors such as season and problems with daily activities, the beneficial social network parameters may be used as potential determinants that can be reinforced by preventive interventions, and all social network parameters may be used in a predictive tool to compute an individual's probability of an infection. Prior to the development of such strategies or a tool, prospective external validation could be encouraged [31]. We do expect that external validation of the models would provide similar results as the shrinkage factors and optimism estimates in our models were very small. In the development of the prediction models, we focussed on social network parameters as it has been shown that social networks can act as a buffer for infections by increasing immune function but can also act as a vehicle for the spread of infections [7][8][9]; also, previous attempts to develop a prediction model based on demographic, environmental, and lifestyle characteristics alone explained only a relatively small proportion of the occurrence of respiratory infections or GI [16].
The results of the present study showed detrimental as well as beneficial associations of the social network, and in all models, social network parameters were strong independent predictors for infections. Simplified, results indicate that infection risk is higher with a higher number of social network members (greater social network size), and with higher levels of emotional support. The latter seems surprising; however, it may indicate that host resistance of persons with a higher need of emotional support is compromised, as it has been shown that infection risk was higher among those with more stressful life events [7]. A likely explanation for our findings is that a larger network indicates exposure to a greater range of infectious agents, and therefore leads to a greater incidence of symptomatic infections. In addition, the likelihood of meeting an infected person is higher in a large network. Yet, most social network parameters assessed are negatively associated with an individual's probability of infections; preventive factors include close geographic proximity (persons living nearby), more network members of the same age, higher proportion of family members, more contact moments with friends, receiving more informational and practical support, and friends and family knowing each other. Previous research has shown that the family is an important source of social support [32], and higher levels of social support have been shown to enhance several aspects of immune function [33,34]. A possible explanation for our findings is that the participant's close social network may act as a buffer for infections, indicating a positive impact on lower susceptibility to these infections.
There was some overlap between the models, as well as between infections reported. Therefore, we checked whether we could combine URI and LRI, and URI, LRI, and GI in combined prediction models. Yet, AUCs were substantially lower when combining the infections.
The use of social network assessment in the prevention of infectious diseases may be a promising target in personalized care for the middle-aged and older persons population. Social network parameters can be used twofold, namely directly to predict the probability of infections in a predictive tool, and indirectly in preventive intervention programmes by addressing the beneficial parameters of the social network or their counterparts. Yet, most network interventions were aiming to accelerate behaviour change, many of them using peer-based interventions [14,15]. The present study adds new insights in possibilities to make use of the social network in prevention strategies. A summary of the associated social network parameters and an indication of their potential use in preventive intervention programmes is depicted in Table 7.
We currently face a gap in the management of infections in older persons: a growing population [1], living longer [2], and being more susceptible to infections [3,5], demanding increasing healthcare due to infectious burden. If we would be able to slightly lower the mean level of exposure, we might have more health impact at population level ('the population strategy') compared with individual treatment of patients (a much smaller group) [35]. Our results may inform feasible and effective infection control, and better self-management in older persons contributing to 'healthy ageing' of the population. Our results agree with the current EU policy that expects older persons to take care of themselves as much as possible with help of their social network [6]. A new prevention strategy may fit this policy by reinforcing the beneficial characteristics of the social network in older persons.
One strength of our study is that it includes a broad range of social network parameters in the development of a prediction model for URI, LRI, and GI, which has not been done before. Another strength is the internal validation procedure. Using shrinkage factor coefficients for calculating the probability of an infection for future patients will less likely result in too extreme predictions. Furthermore, we only had few missing values on most general predictors and the records that were incomplete were imputed. Although we observed 25.6% missing values on income, we assumed the data were missing at random, which means that the probability of missing is related to observed covariates. We used a large amount of variables from the cohort for the imputation model. Our sensitivity analysis showed no clinically relevant differences in AUC when the models were estimated on complete cases only. We did not use complete-case analysis for the main analysis since the assumptions are more strict and thus is more likely to yield biased results and can cause a decrease in statistical power compared with using imputation methods [26]. Moreover, we did not dichotomize continuous predictors, as this may result in loss of information and reduction in statistical power [36].
Nevertheless, this study also has limitations. First, our data were of cross-sectional nature. External validation could be encouraged in prospective data to rule out reversed causality. Nonetheless, as our network assessment covered the past 6 months and infections in the past 2 months, it is highly unlikely that reverse causation would play a role and would have strongly biased our results. Second, self-reporting may be subject to bias. Although the self-reporting of infections has been used successfully in the past in relation to network assessment [10,37], symptoms may be under-or over-reported. However, we focused on symptomatic infections, which may be favourable compared with laboratory assessment, as we only include infections that were experienced as 'illness', and therefore contribute to the perceived infectious burden in middle-aged and older persons. Third, we had seven events per predictor in LRI and GI, while 10 events (infections reported) per predictor variable are recommended [38]. However, we performed internal validation of the models to prevent overfitting that may be induced by <10 events per variable in LRI and GI. Another limitation of this study was missing information on degree of urbanization, as this variable has also been shown to associate with respiratory infections [39]. However, the study area is defined by postal codes,~60% of the population lives in an urban setting, and~40% lives in a suburban/rural setting [17].

Conclusions
To conclude, the use of social network parameters in prediction models for URI, LRI, and GI seems highly promising. In the present study, we used candidate predictors that were easily measurable in practice, and may potentially be used in a practical intervention. Based on the models' discriminatory capacity and accuracy, results could be used directly to estimate a risk for infection given a defined set of parameters, and indirectly in intervention programmes by addressing the beneficial parameters of the social network. Thereby, the use of social network-based prediction models in the prevention of infections in middle-aged and older persons may result in high benefits on a population level.
Supplementary Material. The supplementary material for this article can be found at https://doi.org/10.1017/S0950268817002187.