Social, demographic and behavioural determinants of SARS-CoV-2 infection: a case-control study carried out during mass community testing of asymptomatic individuals in South Wales, December 2020

Between 21 November and 22 December 2020, a SARS-CoV-2 community testing pilot took place in the South Wales Valleys. We conducted a case-control study in adults taking part in the pilot using an anonymous online questionnaire. Social, demographic and behavioural factors were compared in people with a positive lateral flow test (cases) and a sample of negatives (controls). A total of 199 cases and 2621 controls completed a questionnaire (response rates: 27.1 and 37.6% respectively). Following adjustment, cases were more likely to work in the hospitality sector (aOR 3.39, 95% CI 1.43–8.03), social care (aOR 2.63, 1.22–5.67) or healthcare (aOR 2.31, 1.29–4.13), live with someone self-isolating due to contact with a case (aOR 3.07, 2.03–4.62), visit a pub (aOR 2.87, 1.11–7.37) and smoke or vape (aOR 1.54, 1.02–2.32). In this community, and at this point in the epidemic, reducing transmission from a household contact who is self-isolating would have the biggest public health impact (population-attributable fraction: 0.2). As restrictions on social mixing are relaxed, hospitality venues will become of greater public health importance, and those working in this sector should be adequately protected. Smoking or vaping may be an important modifiable risk factor.


Introduction
There is growing evidence that certain population groups are more likely to be affected by severe COVID-19. These include older people, males, pregnant women and people with preexisting chronic disease or disability [1][2][3][4]. People in certain minority ethnic groups and those in public-facing occupations are also disproportionally affected [5][6][7][8], but this is a combination of the risks of acquisition and progression to severe disease.
A proportion of SARS-CoV-2 infections will present as asymptomatic or mild infections, particularly in younger people [9,10], so studies of risk factors for acquiring infection based on those hospitalised will be biased. Compared to evidence on risks of severe infection, limited information is available on the social, demographic and behavioural factors associated with transmission of SARS-CoV-2 infection in the community. Information gathered through the Test, Trace, Protect programme focuses on forwards contact tracing rather than factors associated with acquisition of infection.
A pilot mass testing exercise was initiated in South Wales. Whole borough testing took place in Merthyr Tydfil (population approximately 60 000) [11] between 21 November and 20 December 2020, and was extended to lower Cynon Valley in Rhondda Cynon Taf County Borough Council (an area of about 25 000 population covering five electoral wards) from 5 to 22 December 2020. This was the second such initiative in the UK, after a pilot scheme in Liverpool [12], and the first in Wales. Testing was offered at community settings to asymptomatic people aged 11 and over living, working or studying in the two areas. Symptomatic people were asked to seek tests through other routes. A total of 47 619 lateral flow tests (LFTs) were carried out at 12 testing centres in Merthyr Tydfil and at eight testing centres in the Lower Cynon Valley. Of these, 1135 (2.4%) were positive. People taking part were older than those in the catchment areas, and more tests (55%) were carried out in women.
Rates of confirmed COVID-19 in this relatively deprived, former industrial area of the South Wales Valleys, have been consistently high [13]. This testing exercise presented an opportunity to conduct an epidemiological study to obtain information on factors associated with testing positive for SARS-CoV-2 in a high incidence setting, in order to inform the ongoing response.

Study design
Unmatched case-control. Target population was adults (18 years and over) living, working or studying in Merthyr Tydfil County Borough or electoral wards in the lower Cynon Valley, Rhondda Cynon Taf County Borough selected because they were areas of persistently high incidence. The study population was adults (18 years and over) attending community testing for at least one LFT. Cases were defined as all people attending community testing pilot receiving a positive LFT result. Controls were a sample of those with a negative LFT result.

Recruitment of cases and controls
Data on the results of LFT were de-duplicated to provide the first LFT for each person. These data contained the test result and the mobile phone number which was provided on registration when attending for testing. Rolling recruitment was carried out during the mass testing period. We contacted all cases and for each case, we generated a random sample of 10 individuals from the list of individuals who were tested on the same day but had a negative test result (controls).

Data collection
A questionnaire was designed in the software tool Smart Survey [14]. All newly tested individuals with a positive result (cases) and the sample of negatives (controls) were sent a SMS text message (see Supplementary Material 1) through the government portal texting service 'notify.gov' [15] asking them to complete an anonymous self-administered online questionnaire accessed via a hyperlink. To distinguish between cases and controls, a different link was sent to each group. We asked 37 questions on demographic and social factors, including age, ethnicity and occupation, area of residence, household structure, caring responsibilities and social interactions in the previous 10 days (see Supplementary Material 2).

Analysis
Analysis was carried out using Stata v14 [16]. Response rates for cases and controls were calculated. The age distribution of cases responding was compared to all cases, and the age distribution of controls was compared with the sample selected for recruitment using Spearman rank test. We also compared controls to all people attending mass testing who had a negative result.
We constructed a directed acyclic graph to inform the analysis. Having symptoms was excluded from the multivariable analysis as this considered not to be in the causal pathway. Also, being in contact with a known COVID-19 case was excluded from multivariable analysis, as this would underlie all other associated factors.
Variables were grouped into four categories: (i) personal characteristics, (ii) occupational exposures, (iii) household exposures and (iv) social exposures. Unmatched univariate analysis was carried out using Stata v14 to identify social and demographic factors associated with testing positive. Odds ratios (OR) with 95% confidence intervals (CI) were calculated for each exposure variable using logistic regression. Small area deprivation status was assigned to cases and controls using their area of residence. Deprivation quintiles were calculated based on the distribution of Welsh Index of Multiple Deprivation [16] assigned to lower super output areas (LSOA) in Wales. Each participant was then classified into a deprivation quintile based on their LSOA of residence.
Multivariate analysis was then carried out by logistic regression to take account of potential confounders or effect modifiers, identified a priori or in the univariate analysis. First, all exposures were adjusted for all other exposures within each of the four exposure categories (i) to (iv). Those variables that remained significant at P < 0.05 were included in a further final multivariate analysis to identify those factors most important in predicting risk of infection, with a forward step-wise approach using Akaike information criterion as a comparative fit statistic. Due to collinearity between 'place of work' and 'key worker' fields, three new binary fields were created from the 'key worker' field: 'health and social care worker', 'transport worker' and 'public service worker'.
Lastly, to assess the public health significance of the exposures identified through multivariable analysis, we calculated population-attributable fractions with 95% CI for those exposures that remained positively associated with testing positive after adjustment using punafcc post-estimation command in Stata [17]. In order to use the punafcc command it was necessary to recode the categorical 'place of work' field to create three new binary variables for 'working in a healthcare setting', 'working in a social care setting' and 'working in a hospitality setting'. Other variables of interest were already in a binary format. Adjusted ORs were plotted against population-attributable fractions to investigate the relationship between personal risk and public health impact.

Ethics approval
We carried out his study to inform the ongoing epidemic response; and as such this study was determined by Public Health Wales' Research and Development Office to be usual public health practice covered by Public Health Wales' Establishment Order, and covered by the COVID-19 privacy statement issued for the testing pilot [18]. Data were held and processed under Public Health Wales' information governance arrangements in compliance with the Data Protection Act, Caldicott Principles and Public Health Wales guidance on the release of small numbers. No data identifying protected characteristics of an individual were released outside Public Health Wales. Notify is a UK Government run platform which is a secure mass texting service. Notify is compliant with the Data Protection Act and any user data uploaded (e.g. phone numbers) are deleted after 7 days. Data which pass through the system are encrypted. Notify has been assessed and approved by the Cabinet Office Senior Information Risk Officer (SIRO). The SIRO checks this approval once a year. Notify is suitable for sending messages classified as 'OFFICIAL' or 'OFFICIAL-SENSITIVE' under the Government Security Classifications policy.

Patient and public involvement statement
The study was set up rapidly, and neither patients nor the public were involved in the design and conduct of the case-control study.
However, the questionnaire was piloted on local residents, and the study was closely linked to the evaluation of the whole area testing programme which included input from a wide range of stakeholders, including local community representatives. Headline findings from the study were shared with local residents by Cwm Taf Morgannwg University Health Board via media and social media [19].

Response
SMS messages were sent to 735 positives and 6970 negatives aged 18 years or over and for whom we had a valid phone number for. There were a total of 4409 questionnaire attempts but only completed questionnaires were exported from the survey tool. A total of 199 cases and 2621 controls were recruited, giving response rates of 27.1 and 37.6% respectively.
Cases had a similar age distribution to all people testing positive during the pilot (Spearman's rank correlation, P = 0.07). Negative controls recruited had the same modal age group (50-59 years) as those selected to take part, but older people were over-represented in the control group (Spearman's rank correlation, P = 0.01). Negative controls had a similar age distribution to all people attending mass screening who tested negative (Spearman's rank correlation, P = 0.07).

Symptoms
Nearly all (99.6%) of people attending the testing pilot reported being asymptomatic at the point of test registration. However, at the time of questionnaire completion, 87 of 198 (44%) cases taking part in the study reported symptoms compatible with COVID-19 (loss of sense of smell/taste, a new ongoing cough, or a fever) indicating that a proportion of those testing positive were pre-symptomatic.

Factors associated with a positive LFT
Cases were more likely to be in younger age groups (Table 1). Only small numbers of cases (<10) and controls (81) classified themselves as being in an ethnic group other than white-British or Irish. Cases were slightly more likely to be in a Whiteother ethnicity (OR 1.23), but this was not statistically significant. The majority of cases and controls lived in areas classified as within the three most deprived quintiles. Cases were slightly more likely to live in the most deprived areas and slightly less likely to live in the least deprived areas but this effect did not reach statistical significance (Table 1).
Most cases and controls were resident within the catchment, but cases were less likely to be resident inside the catchment area (OR 0.62, 95% CI 0.45-0.87). Twenty-two per cent of cases reported smoking or vaping compared to 16% of controls (OR 1.51, 95% CI 1.06-2.14). Twenty-six per cent of cases (51/193) were either not working or were working from home, as compared to 42% of controls. Compared to those not currently working or working from home, cases were more likely to work in a social care setting (OR 3.07, 95% CI 1.53-6.14), in a healthcare setting (OR 2.39, 95% CI 1.48-3.87), in hospitality (OR 5.41, 95% CI 2.63-11.12), in an office (OR 2.36, 95% CI 1.48-3.82), in prison (OR 20.65, 95% CI 1.27-334.82), or in an 'other' setting (OR 3.64, 95% CI 1.46-9.07). In those who worked, cases were less likely to work from home (OR 0.43, 95% CI 0.52-0.73) ( Table 2).
Univariate analysis of household exposures (Table 3) indicates that cases were more likely to live in larger households (OR for living with six or more people 4.43, 95% CI 1.79-10.95, using living alone as a reference), were more likely to live with a child aged under 11 years (OR 1.41, 95% CI 1.01-1.97), were more likely to live with someone aged 23-59 years (OR 1.60, 95% CI 1. 16-2.19) and were more likely to live with a healthcare worker (OR 1.60, 95% CI 1.08-2.37). Cases were less likely to live with someone aged 60 years or over (OR 0.63, 95% CI 0.44-0.90) or live with someone working in education (OR 0.52, 95% CI 0.27-0.99).
Although only a small number of respondents visited a pub in the preceding 10 days (8 cases, 38 controls), this was significantly associated with infection (OR 2.85) ( Table 4). Cases were significantly less likely to have had household visitors, and were less likely to visit a shop or supermarket. Cases were not more likely to have caring responsibilities for someone outside their household. Cases were significantly less likely to have attended a face-to-face healthcare appointment in the preceding 10 days.
Cases were more likely than controls to report having been in contact with someone who has been told that they have a positive COVID-19 test in the last 10 days (OR 2.23, 95% CI 1.63-3.05), and more likely to report someone in the household currently self-isolating because they had been in contact with someone with COVID-19 (OR 4.32, 95% CI 3.09-6.06). It is possible that if there was a delay in the case completing the questionnaire, the case may be reporting someone self-isolating due to contact with themselves, but our assumption is that this relates to contact with another confirmed case.

Population-attributable fractions
Population-attributable fractions were 0.040 (95% CI 0.020-0.059) for working in the hospitality sector, 0.033 (95% CI 0.011-0.055) for working in a social care setting, 0.063 (95% CI 0.024-0.100) for working in a healthcare setting, 0.204 (95% CI 0.166-0.241) for living with someone who is self-isolating because they had been in contact with a confirmed case, 0.027 (95% CI 0.015-0.040) for visiting a pub in the preceding 10 days and 0.087 (95% CI 0.021-0.149) for smoking or vaping (Fig. 2). Adjusted ORs for the recoded binary variables 'working in a healthcare setting', 'working in a social care setting' and 'working in a hospitality setting' were reduced slightly to 1.81, 2.07 and 2.65, respectively, but remained significant. Plotting PAF against aOR provides an indication of the relationship between public health and personal risk, and how this might change with changes to interventions.

Discussion
This study provides insight into the factors determining likelihood of testing positive for SARS-CoV-2. The study was carried out at the peak of the second wave of COVID-19 in the UK, and took place in localities which at the time had some of the highest rates of infection in the UK. We have demonstrated that it is possible to rapidly design and implement an epidemiological study to take place alongside a mass testing exercise, without compromising the primary objective of the exercise. We investigated factors determining the likelihood of testing positive for SARS-CoV-2 in order to provide insight into transmission in the community. In this community, household exposures appeared to be important. Household mixing is largely hidden, and may be perceived as lower risk than mixing with people from outside the home [20]. Whilst media attention has focussed on adherence to restrictions affecting social contact outside the home, for example, travelling to exercise, attending work or going to school, transmission within households is being increasingly recognised as an important factor in the epidemiology of SARs-CoV-2 [21][22][23]. The former mining areas of the South Wales Valleys are characterised by close-knit communities, and have similarities with post-industrial towns in the North of England. One in five asymptomatic infections could have been prevented by avoiding contact with someone within the same household. Further work should be carried out to better understand the barriers to infection prevention and control within households, and how best to strengthen prevention and control advice, for example, using online tools [24,25]. Further work could also be done to examine risks associated with household composition, whilst living with children may present an obvious risk, living with older people may increase protective behaviour.
Working in the hospitality sector, and visiting the pub were significant risks but at the time of this study were relatively infrequent exposures. The study took place before national 'lockdown' restrictions were introduced in Wales on 20 December, but were during a time when activity in the hospitality sector was restricted [26,27]. As restrictions on social mixing are relaxed it is likely that exposure in hospitality venues will become of greater public health importance, and people working in this sector should be protected.
Smoking and vaping are potentially modifiable risk factors, and should be investigated further. Evidence for an association between smoking and COVID-19 has been mixed. Some researchers have suggested biological bases for an association. Others have suggested that it may relate to increased 'hand to mouth' contact [28,29]. Smoking confounds other risk behaviours not measured in this study.
Of equal interest are the exposures that were not associated with infection. The policy to close schools and colleges has been debated, with concern that transmission risks are outweighed by the harms caused to children through lost education and socialisation [30]. We found no evidence that education settings provided a significant risk of transmission to adults: working in education, living with someone working in education or living with school age children were not associated with testing positive.
The safety of supermarkets, restaurant, gyms and leisure centre has also been debated [31]. Visiting these facilities did not appear to increase risk of infection. Of course, there is a great variety in the way people behave when shopping or socialising and variation in how shops and leisure facilities apply COVID-safe policies and procedures, and further work would be required to assess these hazards more fully.    Questions were asked about two specific non-pharmaceutical interventions: working from home and the wearing of face masks. Working from home was negatively associated with infection, and remains an important control measure. The results for mask wearing were unclear. In fact, in this study, people testing positive were more likely to report wearing a mask when meeting others inside. Qualitative methods could be used to investigate the behaviours associated with face mask use.

Epidemiology and Infection
With so many associations investigated, it is always possible that some of our associations were chance findings. Moreover, statistically significant negative associations, such as living with an education worker, living with children who attend school, adjusted odds ratios (aOR) for determinants of testing positive for SARS-CoV-2 in two areas of South Wales taking part in a community testing pilot, December 2020. aOR with 95% confidence intervals are given for those factors significant (P < 0.05) in univariate analysis. Odds ratios greater than one represent an increased risk; odds ratios less than one represent a decreased risk. 95% confidence intervals not crossing one reflect that the odds ratio is statistically significant.
visiting a shop or supermarket and attending a face-to-face health appointment may be the result of confounding by an another unknown factor. For example, people attending a face-to-face health appointment may be more likely to be in a clinically vulnerable group and therefore may be mixing less. Selection of variables in the final multivariable model based on statistical significance may have limitations, as exposures only weakly associated in univariate analysis may assume greater importance when combined with other variables [32].
With a response rate of less than 40% it is possible that participants in our study were not representative of those people taking up the offer of testing. Moreover, it is likely that those accepting a test were not representative of people living in the catchment areas. Analysis by Cwm Taf Morgannwg University Health Board found that those taking up testing were older and were resident in less deprived areas of the catchment area.
The questionnaire was designed as a quick online questionnaire, taking participants around 5-10 min to complete, with participants recruited by SMS text message. The personal mobile phone number used to recruit was that given by participants at time of registering for testing at the community testing site, and the number which their lateral flow device test result was subsequently texted to. However, it is possible that some people were excluded from our survey as they did not have a valid mobile phone number of their own, or that their digital literacy level was not sufficient to use the link to our online questionnaire. Although digital tools offer many advantages over traditional paper questionnaires, they do have limitations. Our survey tool only included fully completed questionnaires, whilst quite a large number of questionnaires were partially completed, as the questionnaire was anonymous it is not clear whether these represented failed attempts in people who went on to complete on a second attempt. Also, not all populations have similar access to and expertise in using smart phone technology, and this should be considered when interpreting these types of surveys.
All exposures were self-reported. Although this was an anonymous study, all responses to questions about behaviour may be subject to social desirability bias, and should be interpreted with caution.
As an oversight, we did not include 'gender' on our questionnaire, preventing us from investigating the role of gender in our analysis. Another possible limitation in this study is choice of outcome measure, LFT positivity. LFT is considered to be specific but not particularly sensitive [33][34][35]. There will be some misclassification of cases and controls, but given the prevalence of SARS-CoV-2 in this setting, this is not considered to have had any significant impact on the findings.
The power of the case-control study was restricted by the number of lateral flow device positives, the frequency of certain determinants (e.g. there were only two people in our study reporting working in a prison setting) and our response rate. Factors such as working in a prison whilst no longer significant after adjusting for other variables would warrant further investigation in future studies.
We used population-attributable fractions to assess where interventions could be most effective. However, one should consider this was a very specific population which may not have exposures representative of the general population; and the response rates were different between cases and controls. In addition, population-attributable fractions should be interpreted with caution as these assume all confounding has been controlled for and a causal association between the exposure and the outcome.
Mass testing as a control measure has proved controversial [36,37], but where it is undertaken, associated epidemiological studies can add to the knowledge about transmission risks. Combining this with the calculation of attributable fractions helps to focus on the major drivers of transmission, in order to produce evidence-based responses. Financial support. The study was funded through existing NHS Wales budgets with no external funding.

Conflict of interest. None.
Consent for publication. Patient consent for publication was not required.
Data availability statement. The data that support the findings of this study are available on request from Public Health Wales. However to ensure compliance with information governance arrangements, certain restrictions may apply.