The predictive role of symptoms in COVID-19 diagnostic models: A longitudinal insight

Olivia Bird; Eva P. Galiza; David Neil Baxter; Marta Boffito; Duncan Browne; Fiona Burns; David R. Chadwick; Rebecca Clark; Catherine A. Cosgrove; James Galloway; Anna L. Goodman; Amardeep Heer; Andrew Higham; Shalini Iyengar; Christopher Jeanes; Philip A. Kalra; Christina Kyriakidou; Judy M. Bradley; Chigomezgo Munthali; Angela M. Minassian; Fiona McGill; Patrick Moore; Imrozia Munsoor; Helen Nicholls; Orod Osanlou; Jonathan Packham; Carol H. Pretswell; Alberto San Francisco Ramos; Dinesh Saralaya; Ray P. Sheridan; Richard Smith; Roy L. Soiza; Pauline A. Swift; Emma C. Thomson; Jeremy Turner; Marianne Elizabeth Viljoen; Paul T. Heath; Irina Chis Ster

doi:10.1017/S0950268824000037

The predictive role of symptoms in COVID-19 diagnostic models: A longitudinal insight

Published online by Cambridge University Press: 22 January 2024

Catherine A. Cosgrove and

James Galloway

...Show all authors

Show author details

Olivia Bird: Affiliation:
Vaccine Institute, St. George’s University of London, St. George’s University Hospitals National Health Service Foundation Trust, London, United Kingdom Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
Eva P. Galiza: Affiliation:
Vaccine Institute, St. George’s University of London, St. George’s University Hospitals National Health Service Foundation Trust, London, United Kingdom
David Neil Baxter: Affiliation:
Medical Education, Stockport National Health Service Foundation Trust, Stepping Hill Hospital, Stockport, United Kingdom
Marta Boffito: Affiliation:
Chelsea and Westminster Hospital, National Health Service Foundation Trust, London, United Kingdom
Duncan Browne: Affiliation:
Faculty of Medicine, Imperial College London, London, United Kingdom Endocrinology/Diabetes/General Medicine, Royal Cornwall Hospitals National Health Service Trust, Truro, United Kingdom
Fiona Burns: Affiliation:
Faculty of Population Health Sciences, Institute for Global Health, University College London, and Royal Free London National Health Service Foundation Trust, London, United Kingdom
David R. Chadwick: Affiliation:
Centre for Clinical Infection, South Tees Hospitals National Health Service Foundation Trust, James Cook University Hospital, Middlesbrough, United Kingdom
Rebecca Clark: Affiliation:
Layton Medical Centre, Blackpool, United Kingdom
Catherine A. Cosgrove: Affiliation:
Vaccine Institute, St. George’s University of London, St. George’s University Hospitals National Health Service Foundation Trust, London, United Kingdom
James Galloway: Affiliation:
Centre for Rheumatic Disease, Kings College London, London, United Kingdom
Anna L. Goodman: Affiliation:
Department of Infectious Diseases, Guy’s and St Thomas’ National Health Service Foundation Trust, London, United Kingdom Medical Research Council Clinical Trials Unit, University College London, London, United Kingdom
Amardeep Heer: Affiliation:
Lakeside Healthcare Research, Lakeside Surgeries Corby, Northants, United Kingdom
Andrew Higham: Affiliation:
Gastrointestinal and Liver Services, University Hospitals of Morecambe Bay National Health Service Foundation Trust, Kendal, United Kingdom
Shalini Iyengar: Affiliation:
Accelerated Enrollment Solutions, Synexus Hexham Dedicated Research Site, Hexham General Hospital, Hexham, United Kingdom
Christopher Jeanes: Affiliation:
Department of Microbiology, Norfolk and Norwich University Hospitals National Health Service Foundation Trust, Norfolk, United Kingdom
Philip A. Kalra: Affiliation:
Nephrology, Salford Royal Hospital, Northern Care Alliance National Health Service Foundation Trust, Salford, United Kingdom
Christina Kyriakidou: Affiliation:
Accelerated Enrollment Solutions, Synexus Midlands Dedicated Research Site, Birmingham, United Kingdom
Judy M. Bradley: Affiliation:
Dentistry and Biomedical Sciences, School of Medicine, Wellcome-Wolfson Institute for Experimental Medicine, Queen’s University of Belfast, Belfast, United Kingdom
Chigomezgo Munthali: Affiliation:
Accelerated Enrollment Solutions, Synexus Merseyside Dedicated Research Site, Burlington House, Liverpool, United Kingdom
Angela M. Minassian: Affiliation:
Centre for Clinical Vaccinology and Tropical Medicine, University of Oxford, Oxford, United Kingdom Oxford Health National Health Service Foundation Trust, Warneford Hospital, Oxford, United Kingdom
Fiona McGill: Affiliation:
Department of Microbiology, Leeds Teaching Hospitals National Health Service Trust, Leeds, United Kingdom
Patrick Moore: Affiliation:
The Adam Practice, Dorset, United Kingdom University Hospital Southampton National Health Service Foundation Trust, Southampton, United Kingdom
Imrozia Munsoor: Affiliation:
Accelerated Enrollment Solutions, Synexus Glasgow Dedicated Research Site, Glasgow, United Kingdom
Helen Nicholls: Affiliation:
Accelerated Enrollment Solutions, Synexus Wales Dedicated Research Site, Cardiff, United Kingdom
Orod Osanlou: Affiliation:
School of Medical Sciences (Pharmacology/Pharmacy), Bangor University, Wales, United Kingdom Clinical Pharmacology and Therapeutics/General Internal Medicine, Betsi Cadwaladr University Health Board, Wales, United Kingdom
Jonathan Packham: Affiliation:
Academic Unit of Population and Lifespan Sciences, University of Nottingham, Nottingham, United Kingdom Department of Rheumatology, Haywood Hospital, Midlands Partnership National Health Service Foundation Trust, Stafford, United Kingdom
Carol H. Pretswell: Affiliation:
Accelerated Enrollment Solutions, Synexus Lancashire Dedicated Research Site, Matrix Park Buckshaw Village, Chorley, United Kingdom
Alberto San Francisco Ramos: Affiliation:
Vaccine Institute, St. George’s University of London, St. George’s University Hospitals National Health Service Foundation Trust, London, United Kingdom
Dinesh Saralaya: Affiliation:
National Institute for Health Research, Patient Recruitment Centre, Bradford Teaching Hospitals National Health Service Foundation Trust, Bradford, United Kingdom
Ray P. Sheridan: Affiliation:
Geriatric Medicine, Royal Devon University Healthcare, Exeter, United Kingdom
Richard Smith: Affiliation:
Department of Nephrology, East Suffolk and North Essex National Health Service Foundation Trust, Colchester, United Kingdom
Roy L. Soiza: Affiliation:
Aberdeen Royal Infirmary and Ageing Clinical and Experimental Research Group, University of Aberdeen, Aberdeen, United Kingdom
Pauline A. Swift: Affiliation:
Renal Services, Epsom and St Helier University Hospitals National Health Service Trust, London, United Kingdom
Emma C. Thomson: Affiliation:
School of Infection & Immunity, Medical Research Council-University of Glasgow Centre for Virus Research, and Queen Elizabeth University Hospital, National Health Service Greater Glasgow & Clyde, Glasgow, United Kingdom
Jeremy Turner: Affiliation:
Department of Diabetes and Endocrinology, Norfolk and Norwich University Hospitals National Health Service Foundation Trust, Norfolk, United Kingdom
Marianne Elizabeth Viljoen: Affiliation:
Accelerated Enrollment Solutions, Synexus Manchester Dedicated Research Site, Kilburn House, Manchester, United Kingdom
Paul T. Heath: Affiliation:
Vaccine Institute, St. George’s University of London, St. George’s University Hospitals National Health Service Foundation Trust, London, United Kingdom
Irina Chis Ster*: Affiliation:
Institute of Infection and Immunity, George’s University of London, London, United Kingdom
*: Corresponding author: Irina Chis Ster; Email: ichisste@sgul.ac.uk

Article contents

Abstract
Introduction
Methods
Results
Discussion
Limitations
Conclusion
Data availability statement
Author contribution
Funding statement
Competing interest
Disclaimer
Footnotes
References

Rights & Permissions

Abstract

To investigate the symptoms of SARS-CoV-2 infection, their dynamics and their discriminatory power for the disease using longitudinally, prospectively collected information reported at the time of their occurrence. We have analysed data from a large phase 3 clinical UK COVID-19 vaccine trial. The alpha variant was the predominant strain. Participants were assessed for SARS-CoV-2 infection via nasal/throat PCR at recruitment, vaccination appointments, and when symptomatic. Statistical techniques were implemented to infer estimates representative of the UK population, accounting for multiple symptomatic episodes associated with one individual. An optimal diagnostic model for SARS-CoV-2 infection was derived. The 4-month prevalence of SARS-CoV-2 was 2.1%; increasing to 19.4% (16.0%–22.7%) in participants reporting loss of appetite and 31.9% (27.1%–36.8%) in those with anosmia/ageusia. The model identified anosmia and/or ageusia, fever, congestion, and cough to be significantly associated with SARS-CoV-2 infection. Symptoms’ dynamics were vastly different in the two groups; after a slow start peaking later and lasting longer in PCR+ participants, whilst exhibiting a consistent decline in PCR- participants, with, on average, fewer than 3 days of symptoms reported. Anosmia/ageusia peaked late in confirmed SARS-CoV-2 infection (day 12), indicating a low discrimination power for early disease diagnosis.

Keywords

coronavirus longitudinal data symptoms dynamics

Information

Type: Original Paper
Information: Epidemiology & Infection , Volume 152 , 2024 , e37

DOI: https://doi.org/10.1017/S0950268824000037 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Introduction

The SARS-COV-2 pandemic has contributed to significant global morbidity and mortality. As of March 7, 2023, there have been over 759 million cases of COVID-19, including 6.8 million deaths [1]. The burden of disease was greatly felt by all public health organizations, particularly by healthcare systems that were frequently put under strain as they managed surges of infections [Reference Emanuel2]. The unprecedented scale and speed of the pandemic, its similarities to influenza, and the three major foci of care homes, hospitals, and the community, proved to be a challenging combination for devising a standard list of symptoms for COVID-19. Accurate recognition of the symptoms that indicated infection and warranted urgent testing was particularly important in the early stages of the pandemic when polymerase chain reaction (PCR) test kits were in demand [Reference Rossman3].

The gold standard for diagnosing SARS-COV-2 infection is an oropharyngeal/nasal PCR swab, although latterly lateral flow tests are used for rapid diagnosis [4]. In the UK, PCR testing was initially prioritized for those presenting with a new (or worsening) cough, fever, or breathlessness [Reference Johnson-León5]. However other symptoms such as altered or loss of smell (anosmia) or taste (ageusia), and gastrointestinal symptoms (such as loss of appetite and diarrhoea) have also been associated with COVID-19 [Reference Lara6–Reference French8]. In a Cochrane Review (2021), mainly based on more severely affected populations (e.g. hospitalized patients), the pooled specificities for anosmia and ageusia were high (90.5%), suggesting these symptoms may be a useful marker for COVID-19 [Reference Struyf9]. The updated review (2022) concluded that most other individual symptoms had poor diagnostic accuracy [Reference Struyf10].

In a study of 483 subjects in Washington D.C., 42% of whom were healthcare or essential workers, aged between 25 and 44 years, who retrospectively reported symptoms, 27% were reported to be PCR positive. Wojtusiak et al. concluded that clusters of symptoms are more predictive of COVID-19 than any one specific symptom [Reference Wojtusiak11]. In a different study, the same authors also examined the importance of the order of symptom occurrence in deriving a disease diagnostic model [Reference Wojtusiak12]. A meta-analysis based on sample data collected from nine established longitudinal cohorts designed a four-category cross-sectional outcome aiming to capture characteristics of long COVID in the UK population [Reference Bowyer13]. Based on questionnaires completed by subsets of participants between July 2020 and September 2021 and self-reported COVID results as well as the presence/absence of symptoms, the meta-analysis demonstrated considerable heterogeneity between studies [Reference Bowyer13].

The observation of previous research shows that there is a great deal of variation in data collection methods (e.g. smartphone apps, patient records [Reference Drew14–Reference Menni16]), epidemiological heterogeneity of study populations (e.g. hospitals, intensive care units, care homes [Reference Bowyer13–Reference Canas15]), and different reporting methods (e.g. self-reports, interviews [Reference Elliott17]). As symptoms develop over time, cross-sectional outcomes and retrospectively collected information on symptoms may be difficult to relate to COVID-19 onset which is also known to have a variable incubation period (2–14 days) [Reference Lauer18]. The Zoe Health Study compared three different symptom-based diagnostic models for SARS-CoV-2 and investigated the effect of demographic variables on the models’ performance metrics and found that the discrimination power of all models improved with the number of days of symptoms included, whilst the most relevant symptoms for detecting COVID-19 were anosmia and chest pain [Reference Wojtusiak12].

The phase 3 Novavax COVID-19 clinical trial in the UK was conducted at 33 sites and recruited 15,185 participants [Reference Heath19]. Its primary aim was to evaluate the efficacy and safety of the vaccine. We used the prospectively reported symptoms of possible SARS-CoV-2 infection to assess the discrimination power of individual symptoms and to investigate an optimal combination to generate a diagnostic model for the presence of SARS-CoV-2 infection in the UK population.

Methods

The data for this analysis were provided by Novavax, Inc. [Reference Heath19]. The methods and results of the trial are described elsewhere [Reference Heath19]. Data included are from October 28, 2020 to February 28, 2021.

Monitoring for COVID-19

All participants had a SARS-CoV-2 PCR test performed at recruitment and were tested for symptomatic infections throughout the study. Participants were instructed to contact the study team within 24 h if they self-assessed COVID-19 symptoms (Table 1), triggering a surveillance visit. Throat/nasal swabs were self-collected by participants approximately 24 h after the onset of symptoms, then daily for up to 3 days. A participant with suspected or confirmed COVID-19 was asked to complete a symptom diary, starting on their first day of symptoms, reporting daily for a minimum of 10 days (even if their symptoms resolved and regardless of SARS-CoV-2 PCR result). Participants with confirmed symptomatic COVID-19, signified by a positive PCR test, continued documenting their symptoms until resolution. Virologic confirmation was performed by PCR assay at the U.K. Department of Health and Social Care laboratories with the TaqPath system (Thermo Fisher Scientific).

Table 1. Qualifying symptoms of suspected COVID-19

Statistical methodology

The main objective was to construct an optimal diagnostic model for COVID-19 based on participants’ symptoms and to highlight differences in the dynamics of specific symptoms in groups defined by participants who experienced COVID-19 and those who did not. To extrapolate the results to the UK population, we started by plotting and empirically comparing the distribution of age, gender, and ethnicity distributions in the sample data to that of the UK population [20–22]. We then used post-stratification techniques for incorporating population demographic distributions [Reference Little23]. This procedure allowed us to produce estimates generalizable to the UK community population. Weights were derived and assigned to each participant such that the subsequent estimation procedures inflated the effect of under-represented groups (e.g. young ethnic minorities) and depressed the effect of overrepresented groups in the sample (e.g. old whites).

We constructed a master file that included multiple PCR tests per participant and multiple symptomatic episodes. The resulting data have a hierarchical structure with implications on the subsequent choice of analyses and estimation procedures (details in the Supplementary material). Participants were initially grouped by their PCR results, that is, participants with at least one PCR positive result (PCR+) and those always negative (PCR-). We reported the frequency and proportion of the symptomatic participants in the two groups. We estimated the probabilities of testing positive given a specific symptomatic episode and the mean number of reports (or number of days) of a specific symptom within an illness episode. We also investigated the symptom report dynamics and explored the extent to which symptoms were associated with demographics. These analyses identified the main confounder candidates and their potential influence on the subsequent receiver operating characteristic (ROC) analyses.

Non-parametric techniques such as local polynomial smoothing have been used to fit curves on the daily probabilities of the reports in the PCR+ and PCR− participants. A heatmap of daily probabilities of reported symptoms has also been presented in ascending order of their magnitude on the first day in positive patients.

We assessed the effect of reporting the number of days of each specific symptom on the probability of testing PCR+ vs. PCR−, measured as odds ratios and their 95%CIs. We derived a symptom-based diagnostic model using two-level logistic regression and evaluated the discriminatory power of this model using the area under the curve (AUC) as a metric for its discrimination. We also performed a two-stage process ROC analysis [Reference Alonzo24]. The technique allows multiple episodes to be associated with an individual, and adjustments using population weights. The result is an estimate of the ROC curve for each specific symptom as a function of age and ethnicity – known as a covariate-specific ROC curve [Reference Alonzo24]. Using these techniques, we also highlighted the increasing discrimination power of individual symptoms based on the temporally ordered reports restricted to the first 1, 2, 3 to longer than 15 days after the start of the symptomatic illness episode. The effect of age and ethnicity on the discrimination power of individual symptoms was also evaluated. More details are in the Supplementary material.

Results

Data summary

Table 2 shows a simplified picture of the data based on a binary assessment. Of 15,139 participants, 317 (2.1%) had a PCR+ episode and 3,320 (21.9%) had at least one symptomatic episode. 8% (266/3320) of the symptomatic population were PCR+ and 84% (266/317) of the PCR+ participants reported symptoms. Figure 1 shows the age distribution against that of the UK population stratified by gender and ethnicity [20–22]. These data have been used to calculate the weights associated with our analyses.

Table 2. PCR and symptomatic status of all study participants; 3,320 (21.9%) of all participants had at least one symptomatic episode and 317 (2.1%) of all had a PCR+ episode

Figure 1. Age distribution in the study sample compared to that of the UK population, stratified by gender and ethnicity.

Table 3 presents demographic data stratified by PCR status. The comorbidities variable indicates the presence of at least one comorbidity. COVID-19 was directly associated with younger age, that is, 1 year increase in age decreased the OR of COVID-19 by a small yet significant factor of 0.98 (p < 0.001). Ethnic minorities (excluding white) were twice as likely to test positive than their white counterparts, that is, OR = 1.924 (95%CI (1.169, 3.167)). The other than white category included Asians (n = 462 (3.1%)), Black (n = 60 (0.4%)), and others (n = 153 (1%)).

Table 3. Cohort demographic characteristics stratified by participant PCR status

The ORs measure univariate associations between the PCR status and population characteristics, irrespective of the presence of symptoms. Statistically significant associations are marked in bold.

Summary symptoms data (overall and stratified by PCR status) are presented in the Supplementary Material and illustrated in Figure 2. Runny nose (16.9%) was the most reported symptom in this cohort, followed by cough (14.6%) and tiredness (12.6%). Nausea (5.3%), diarrhoea (4.1%), and anosmia/ageusia (3.6%) were the least reported. This ordering is preserved in PCR− participants; however, in PCR+ participants cough (75.1%) was the most frequent symptom, followed by congestion (74.8%) and tiredness (74.4%). Anosmia/ageusia was reported by 53.3% of PCR+ participants versus 2.5% of PCR− participants.

Figure 2. Proportions of participants with specific symptoms, overall, and stratified by PCR status, as shown in the Supplementary Material. For example, overall, 16.9% of all participants reported runny nose at least once but the figure is much higher (72.6%) among PCR+ contrasting with 15.7% among PCR−.

The probabilities of PCR status by specific symptom reports

Figure 3 shows the probabilities of testing PCR+ conditioned on each symptom (reported at least once). The prevalence of COVID-19 was 31.9% (27.1%–36.8%) in those reporting anosmia/ageusia and 19.4% (16%–22.7%) for loss of appetite.

Figure 3. Predicted probabilities of PCR+ status, stratified by the presence of specific symptoms, and their 95%CIs. Predictions related to each specific symptom are unadjusted for the others and are based on a binary regression with robust standard errors accounting for multiple episodes with events associated with a participant. For example, in participants with loss of taste or smell, regardless of the presence or absence of other symptoms, the probability of a positive PCR test is 0.319 (31.9%).

The number of specific symptom analyses

Figure 4 shows the mean number of days (and their 95%CIs) that each specific symptom was reported during a symptomatic episode, stratified by PCR status. PCR+ participants reported a significantly longer duration of specific symptoms compared to PCR− participants. For example, the mean number of days of cough was 6–7 in PCR+ participants and 2–3 in PCR− participants.

Figure 4. Predicted mean of number of days specific symptoms were reported during an episode and their 95%CIs. The red values (PCR+) are referred to the left axis and the blue values (PCR−) are referred to the right axis. The analysis is restricted to symptomatic participants only. For example, for those participants reporting cough as part of an episode, the mean of the number of days was 6–7 days in PCR+ participants and 2–3 days in PCR−.

Table 4 presents an exploratory analysis on the rate ratios (fold-effects) as measures of associations between the mean number of days of specific symptoms with population characteristics, this has been also analyzed in the PCR+ subgroup in the Supplementary Material. From Table 4, we learn that age was directly associated with an increased number of reports of runny nose, cough, and loss of appetite, but inversely associated with sore throat and anosmia/ageusia. Women reported 24.3% (95%CI (11.4%, 38.7%)) more headaches than men. Other than white participants reported fewer symptoms than white participants; for runny nose by a factor of 0.76 (95%CI (0.65, 0.89)), cough (by a factor of 0.77 (95% (0.62, 0.95)), and congestion (by a factor of 0.77 (95% (0.62, 0.96)). Increasing BMI was associated with increased reporting of myalgia (P = 0.033) and breathlessness (p < 0.001). Those with co-morbidities reported 18.5% (95%CI (8.1%, 29.8%)) more days of cough, 16.1% (95%CI (1.9%, 32.2%)) more days of myalgia, and 22.4% (95%CI (3.6%, 44.5%)) more days of breathlessness on average, than those without co-morbidities (Table 4).

Table 4. Fold-effects (risk ratios) of demographics and their 95%CIs on the mean number of days of specific symptoms reported during a symptomatic episode

The estimation uses a Poisson zero-inflated model on the number of reports of an episode and allows for multiple episodes with events associated with one participant. Statistically significant associations are marked in bold.

In those with a positive PCR (Supplementary Material), many of these trends remained significant, for example, the effect of age on myalgia (P = 0.039) and loss of appetite (P = 0.012), the effect of gender on headaches (P = 0.033), of ethnicity on congestion (P = 0.002) and of BMI on breathlessness (P = 0.012). Increased BMI was associated with longer duration of cough (P = 0.022).

Figures 5 and 6 present the daily probabilities of specific symptoms (starting with the first report of any symptom), stratified by PCR result. Whilst these probabilities fall swiftly in PCR− participants (Figure 6), they start more slowly and peak later in those with COVID-19 (Figure 5). Fever peaked on the 4th day (24%), followed by chills (27%), whilst myalgia (31%) and loss of appetite (28%) peaked on the 5th day. Anosmia/ageusia (27%) and cough (43%) peaked on the 12th day. These findings are also reflected in Figure 7; symptoms in PCR− participants fall rapidly shown by the dark purple, whereas they are later to peak and slower to fade in PCR+ participants, shown by the changing colour scale.

Figure 5. Daily probabilities of reporting specific symptoms starting with the first report conditioned on PCR+ participants and their corresponding illness episode, that is, ignoring the symptomatic episodes associated with these participants which were PCR-. Non-parametric methodology was used to capture the shape of the individual longitudinal daily reports.

Figure 6. Daily probabilities of reporting specific symptoms starting with the first report using PCR- symptomatic episodes across all participants.

Figure 7. Probabilities of daily occurrences of various symptoms have similar magnitude in both PCR+ and PCR− groups on the first reporting day whilst they peak up later during illness evolution in PCR+ patients and decline in those PCR−, also reflected in previous figures 5 and 6.

The optimal diagnostic model for testing PCR+ based on symptoms and controlled for population characteristics

Figure 8 presents the effects (ORs) of reporting a specific symptom for 3 days within an episode, on the probability of testing PCR+. The rationale for considering the 3-day symptom effect as a meaningful magnitude for the length of reports was inspired by Figure 4. In this figure, all specific symptoms seem to have a mean of less than 3 days in PCR− participants. Anosmia/ageusia (OR = 14.4 (95%CI 9.2, 22.6)), nausea (OR = 5.8 (95%CI 4.2, 7.9)), loss of appetite (OR = 5.6 (95%CI 4.5, 7.2)), and fever (OR = 5.4 (95%CI 4.2, 6.97)) have the strongest effects in terms of magnitude and statistical significance.

Figure 8. Effect (OR) of reporting a specific symptom for 3 days during an episode, irrespective of other symptoms reported during that episode.

The most parsimonious model, that is, the model with the least number of predictors, yet explaining the most variability in the data, is shown in Table 5. The model retains anosmia/ageusia (OR = 5.2 (95%CI 3.4, 7.9)), loss of appetite (OR = 2.3 (95%CI 1.6, 3.3)), fever (OR = 1.9 (95%CI 1.4, 2.6)), congestion (OR = 1.9 (95%CI 1.5, 2.4)), and cough (OR = 1.3 (95%CI 1.1, 1.6)) as key symptoms associated with a PCR+ episode, whilst runny nose (OR = 0.7 (95%CI 0.5, 0.9)) and chills (OR = 0.6 (95%CI 0.4, 0.8)) are associated with testing PCR−. This model has a discrimination power of approximately 0.86 in terms of AUC but does not account for population weights.

Table 5. Optimal model for PCR+ based on symptoms and population characteristics on a two-level weighted logistic regression analysis

The adjusted effects of three days of specific reports are shown.

Supplementary Material presents combinations of symptoms predicting the probabilities of COVID-19 using the optimal model. For example, a white participant of 50 years of age would have over 90% probability of testing PCR+ if s/he reported 3 days of loss of taste and smell, 3 days of loss of appetite, 3 days of fever, and 3 days of cough with 1 day of congestion, runny nose, and chills.

The discriminatory power of specific symptoms

Figure 9 shows how the discriminatory power of individual symptoms evolves if only the first number of days after onset is considered – that is only day 1, only days 1–2, only days 1–3, and so on. Symptoms that peak later such as anosmia/ageusia gain discrimination power as the number of days of reporting increases. For other less specific symptoms, the individual discrimination power remains constant or even declines, for example, sore throat peaks very early and then tapers off.

Figure 9. Discrimination power of individual symptoms based on the temporally ordered reports restricted to the first 1, 2, 3 to longer than 15 days after the symptomatic illness episode starts.

The area under the curve in Figure 10 shows the discrimination power of each symptom in the model using the maximum likelihood ROC 2-stage regression analysis (uncontrolled for age and ethnicity and population-weighted). The higher the AUC, the better the symptom discriminates between PCR+ and PCR−, the steep incline of the curve followed by the flattening line suggests that discrimination is little affected as the number of false positives increases.

Figure 10. Estimated discrimination power of each classifier. The plot and the AUC estimate follow a maximum likelihood ROC-weighted regression analysis uncontrolled for age and ethnicity.

When controlled for age and ethnicity, the two-stage ROC model does not quantify their effects on the ROC curve of specific symptoms in a directly interpretable manner, but qualitative conclusions are displayed in Table 6 and visualized in Figure 11. Age and ethnicity affect the ROC curve for each symptom, notably, the discriminatory power of anosmia/ageusia decreased with increasing age and is smaller ethnic minorities, compared to white ethnicity.

Table 6. Effect of age and ethnicity on the ROC curve and subsequently on discrimination power associated with each classifier in the model.

The coefficients are only qualitatively interpreted.

Figure 11. Effect of age and ethnicity on the ROC curve and subsequently on discrimination power associated with each classifier in the model. The colours indicating specific symptom are similar to those displayed in Figure 10.

Discussion

The main objectives of this study were to develop a symptom-based diagnostic model for a PCR− proven SARS-CoV-2 infection, and investigate the dynamics of the symptoms and their discrimination power for a potential COVID-19 diagnostic model. Our prospective, longitudinal, real-time collection, together with analytical techniques (post-stratification weights [20–22]), which produce generalizable results for the UK adult community population, provides a better understanding of the dynamics of COVID-19 symptomology. The rather poor engagement of people other than white in COVID-19 clinical trials has been documented [Reference Murali25] but our method overcame this difficulty.

We found a 4-month prevalence of COVID-19 of 2.1%, in line with the estimated population prevalence at that time [26]. Of the individual symptoms, anosmia and/or ageusia were the least reported symptoms overall (3.6%); however, participants reporting them for 3 days were more likely to test positive for COVID-19 (OR = 14.4 (95%CI 9.2, 22.6)). Figure 3 presents the probabilities of testing positive conditioned on symptom reports. Also, of those testing positive for SARS-CoV-2, over half (53.3%) reported the presence of anosmia or ageusia (Figure 2). Other symptoms such as loss of appetite, a new fever, congestion, and cough were strongly associated with a positive result. Fever, cough, and anosmia/ageusia have been identified as the strongest candidates for predicting COVID-19 in studies such as a REACT-1 and also in a meta-analysis of 9 studies examining symptoms of COVID-19 and long COVID syndrome [Reference Bowyer13, Reference Elliott17]. The odds of having COVID-19 have been reported as positively associated with 3 days reported with shortness of breath (OR = 3.1, (95%CI(2.9, 3.3)), although our results do not support it as a ‘leading’ symptom [Reference Bowyer13]. On its own runny nose was the most reported symptom (16.9%) in our study, and frequently reported in those with confirmed COVID-19 (72.6%). The participants reporting it were the least likely (8%) to test positive for COVID-19 (Figure 3), when accounting for the entire episode, and the symptom turned out to have high discriminatory power (AUC = 0.83, Figure 9) in ruling out the disease, consistent with other findings [Reference Wojtusiak11, Reference Elliott17].

Unlike many other studies [Reference Lara6–Reference French8, Reference Struyf10, Reference Menni16], this research examined the number of days that specific symptoms are reported within an infection episode. We found that PCR+ participants reported a significantly longer duration of specific symptoms per episode, compared with those that were PCR−; cough had the longest duration followed by tiredness whilst runny nose had the longest duration among PCR− participants. We also found that cough, anosmia/ageusia, and loss of appetite peaked later in SARS-CoV-2 infection, typically around day 12 (Figure 5). Research in Czechoslovakia demonstrated anosmia and ageusia had a later onset than other symptoms, beginning a median of two or more days after the onset of symptoms, and lasting longer than fever or loss of appetite [Reference Weinbergerova27]. These findings are consistent with Wojtusiak et al. who found that headaches, chills, and cough were more relevant if they occurred at onset, whilst loss of taste and smell and loss of appetite had a higher relevance if they occurred later in the infection [Reference Wojtusiak12].

Previous research has suggested that individual symptoms are not predictive of COVID-19 on their own. Our analysis has suggested that individual symptoms would not have had sufficient predictive power for COVID-19 early in their occurrence but that this would increase with the number of days in which they manifest (Figure 9). Hence, our final predictive model is based on specific symptomatic episodes, that is, their entire number of symptomatic days within an episode and adjusted for age and ethnicity. The model retained episodes of anosmia/ageusia, loss of appetite, fever, congestion, and cough as all positively associated with testing PCR+, together with runny nose, chills, and age as all negatively associated with testing PCR+ (Table 5) consistent with other findings [Reference Rodriguez-Palacios28]. The concept of 3 days as a meaningful magnitude for the length of reports was inspired by Figure 4, in which all symptoms had a mean of less than 3 days in PCR− participants. In light of this, this information may be particularly useful at the time of clinical triage, namely the number of days symptoms have been experienced by subjects presenting for hospital care. The model, based on two-level logistic regression, has a discriminating power of ~86%.

Our ROC analysis showed that the discrimination power of anosmia/ageusia increased from irrelevance during the first few days to exceeding all others after day 9 (Figure 9). Our report also showed that the discriminatory power of anosmia/ageusia decreases with age, which may reflect a biological phenomenon associated with ageing [Reference Boyce29]. Cough alone remained relatively constant in its discrimination power, however, PCR− participants also reported prolonged cough. Our data do not support diarrhoea as a candidate symptom of COVID-19.

Two-stage ROC analysis suggests that the prediction power may be less discriminatory in older participants and in those from ethnic minorities, this was true for all symptoms. Comparatively, the Canas et al. model showed better discrimination in participants of normal weight compared to those who were underweight and/or overweight, and in non-healthcare workers and, consistent with our results, found that younger people were more likely to test PCR+, possibly due to increased social mixing [Reference Canas15]. Our diagnostic model is similar to this model as it identified persistent cough and loss of smell, alongside abdominal pain and myalgia as early features of COVID-19 [Reference Canas15]. However, the Canas model had a younger population than our study (mean age 46.7 years vs. 53.1 years) and COVID-19 was self-reported, thereby the results are difficult to compare [Reference Canas15]. Moreover, the study reported ‘blisters on the feet’ and ‘eye soreness’ as relevant features of COVID-19, the significance of which the paper questions itself [Reference Canas15].

Our estimated prevalences of specific symptoms among both positive and negative groups are higher than those presented in the meta-analysis by Bowyer et al. [Reference Bowyer13]. Although the study participants stem from nine longitudinal cohorts, the data collection is essentially retrospective and cross-sectional. The authors stated a great deal of heterogeneity. Notably, the data have been collected during the summer whilst ours were collected during the winter, including Christmas, when transmission intensified, hence we postulate that variation could be attributable to the season. Our prevalence of specific symptoms among PCR+ and PCR− are closest to those from Generation Scotland cohort (access via Bowyer et al. or from the University of Edinburgh) [Reference Bowyer13, 30] consistent with our explanation above, given somewhat cooler temperatures in Scotland during the summer. We have retrieved some partial information and appended a relevant comparative table in the Supplementary material.

Though multiple centres participated in the clinical trial, the three-level regression techniques did not reveal important differences in the estimates or their standard errors. Variability between the centres was not expected to be significant as the same trial protocol and procedures were used. We have disregarded the effect of the intervention (placebo or vaccine), as preliminary analysis did not show a significant impact on results (data not shown).

Limitations

Despite the data being gathered prospectively and in real-time, we observed gaps in daily records, for example, a participant may report fever for 3 consecutive days, then none on the fourth day and then again on the fifth and sixth days. The statistical analysis considered the number of reports (i.e. the number of days with specific symptoms) rather than the whole length of time they were experienced. This may have led to underestimating their effect; however, we are confident that recall bias has been minimalized to a greater extent than if the data had been collected from a retrospectively collected self-report. Asymptomatic infections are likely to be underrepresented in this analysis. As this research set out to explore symptoms of COVID-19, we do not believe this to be a major limitation to our analysis, but it does mean we cannot calculate the true prevalence of COVID-19 infections in the study population. Unfortunately, we also did not benefit from information such as recent contacts or travel/work patterns, which could have been useful in building a reliable diagnostic model as suggested by the Cochrane Review article [Reference Struyf10]. At the time of data collection, the circulating strain of SARS-CoV-2 was the alpha variant [31], however, omicron has a higher tropism for nasoepithelial cells than pulmonary cells [Reference Willett32] and anosmia has been reported less frequently with the omicron variant [Reference Menni33]. Therefore, care should be taken if applying the model outside our study population.

Conclusion

This research adds to the body of literature on COVID-19 symptoms as an in-depth exploration of symptoms reported by those unaware of their diagnosis at the time of reporting, thereby minimizing reporting bias. We found younger participants, and those from ethnic minorities were more likely to test positive for COVID-19 and, consistent with previous research, anosmia and/or ageusia most strongly predict a positive PCR result; however, we have also shown that these symptoms peak late in infection. This calls into question their consideration as early markers of the disease. Similar to other research we found that a cluster of fever, congestion, and cough are all positively associated with COVID-19, with PCR+ participants reporting more days of symptoms, for example, cough, than those who were PCR−. We also found that diarrhoea, runny nose, and chills are not indicative of COVID-19. Overall, our model has a discriminating power of 86% to predict COVID-19; although, as anosmia and ageusia often develop later in the infection, our proposed model is unlikely to identify early infections, particularly, in the elderly or those from ethnic minorities.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S0950268824000037.

Data availability statement

The data are available upon request and subject to Novavax’s permission. Please contact Professor Paul T. Heath, pheath@sgul.ac.uk.

Acknowledgements

2019nCoV-302 Study Group Members:

The NVX-CoV2373-2019nCoV-302 clinical trial was a collective group effort across multiple institutions and locations. Below is a list of sites and staff that significantly contributed to the implementation and conduct of the NVX-CoV2373-2019nCoV-302 clinical trial.

Author contribution

Writing – review & editing: A.S.F.R., I.C.S., A.H., C.M., C.K., A.H., A.M.M., A.L.G., C.H.P., C.C., C.J., D.N.B., D.R.C., D.S., D.B., E.P.G., E.T., F.B., F.M., H.N., I.M., J.G., J.T., J.P., J.M.B., M.E.V., M.B., O.O., P.M., P.A.S., P.T.H., P.A.K., R.P.S., R.C., R.S., R.L.S., S.I., O.B.; Conceptualization: I.C.S., P.T.H., O.B.; Formal analysis: I.C.S., P.T.H., O.B.; Methodology: I.C.S., P.T.H., O.B.; Supervision: I.C.S., P.T.H.; Validation: I.C.S.; Writing – original draft: I.C.S., P.T.H., O.B.; Investigation: P.T.H.

Funding statement

This study received no specific funding.

Competing interest

C.A.C. reports receiving grant support, paid to her institution, from Novavax, Moderna, GSK. A.L.G. reports receiving grant support, paid to her institution, from Novavax and entered into a partnership with AstraZeneca for further development of ChAdOx1 nCoV-19. A.L.G. is named as an inventor on a patent covering the use of a particular promoter construct that is often used in vectored vaccines and is incorporated in the ChAdOx1 nCoV-19 vaccine and may benefit from royalty income paid to the University of Oxford from sales of this vaccine by AstraZeneca and its sublicensees under the university’s revenue sharing policy. P.T.H. reports receiving grant support, paid to his institution, from Novavax, Pfizer, Moderna, Valneva, Janssen, Astra Zeneca. I.C.S. declares receiving grant support, paid to her institution, from NIHR and Astra Zeneca. Other authors reported no competing interest.

Disclaimer

The findings and conclusions presented here are the authors and do not necessarily represent the views of Novavax themselves, although the affiliated authors were given the opportunity to review the submission and provide feedback.

Footnotes

P.T.H. and I.C.S. contributed equally to this work.

References

World Health Organisation (2023) WHO Coronavirus (COVID-19) Dashboard, WHO Coronavirus (COVID-19) Dashboard With Vaccination Data. Available at https://covid19.who.int/ (accessed 7 March 2023).Google Scholar

Emanuel, EJ, et al. (2020) Fair allocation of scarce medical resources in the time of COVID-19. The New England Journal of Medicine 382, 2049–2055.CrossRef Google Scholar PubMed

Rossman, H, et al. (2020) A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys. Nature Medicine 26(5), 634–638.CrossRef Google Scholar PubMed

National Health Service (2022) Main Symptoms of Coronavirus (COVID-19) – NHS. Available at https://www.nhs.uk/conditions/coronavirus-covid-19/symptoms/main-symptoms/ (accessed 30 March 2022).Google Scholar

Johnson-León, M, et al. (2021) Executive summary: It’s wrong not to test: The case for universal, frequent rapid COVID-19 testing. eClinicalMedicine 33, 100759.CrossRef Google Scholar

Lara, BA, et al. (2021) Clinical prediction tool to assess the likelihood of a positive SARS-Cov-2 (COVID-19) polymerase chain reaction test in patients with flu-like symptoms. Western Journal of Emergency Medicine 22(3), 592–598.CrossRef Google Scholar PubMed

Duque, MP, et al. (2021) COVID-19 symptoms: A case-control study, Portugal, March–April 2020. Epidemiology & Infection 149, e54.CrossRef Google Scholar PubMed

French, N, et al. (2021) Creating symptom-based criteria for diagnostic testing: A case study based on a multivariate analysis of data collected during the first wave of the COVID-19 pandemic in New Zealand. BioMed Central Infectious Diseases 21(1), 1119.CrossRef Google Scholar PubMed

Struyf, T, et al. (2021) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19. Cochrane Database of Systematic Reviews 2(2), CD013665.Google Scholar PubMed

Struyf, T, et al. (2022) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID‐19. Cochrane Database of Systematic Reviews 5(5), CD013665.Google Scholar PubMed

Wojtusiak, J, et al. (2023) The role of symptom clusters in triage of COVID-19 patients. Quality Management in Health Care 32(suppl. 1), S21–S28.CrossRef Google Scholar PubMed

Wojtusiak, J, et al. (2023) Order of occurrence of COVID-19 symptoms. Quality Management in Health Care 32(suppl. 1), S29–S34.CrossRef Google Scholar PubMed

Bowyer, RCE, et al. (2023) Characterising patterns of COVID-19 and long COVID symptoms: Evidence from nine UK longitudinal studies. European Journal of Epidemiology 38(2), 199–210.CrossRef Google Scholar PubMed

Drew, DA, et al. (2020) Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science 368(6497), 1362–1367.CrossRef Google Scholar PubMed

Canas, LS, et al. (2021 ) Early detection of COVID-19 in the UK using self-reported symptoms: A large-scale, prospective, epidemiological surveillance study. The Lancet Digital Health 3(9), e587–e598.CrossRef Google Scholar PubMed

Menni, C, et al. (2020) Real-time tracking of self-reported symptoms to predict potential COVID-19. Nature Medicine 26(7), 1037–1040.CrossRef Google Scholar PubMed

Elliott, J, et al. (2021) Predictive symptoms for COVID-19 in the community: REACT-1 study of over 1 million people. Public Library of Science Medicine 18(9), e1003777.Google Scholar PubMed

Lauer, SA, et al. (2020) The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Annals of Internal Medicine 172(9), 577–582.CrossRef Google Scholar PubMed

Heath, PT, et al. (2021) Safety and efficacy of NVX-CoV2373 COVID-19 vaccine. New England Journal of Medicine 385(13), 1172–1183.CrossRef Google Scholar PubMed

Office for National Statistics (2023) Population Estimates for the UK, England and Wales, Scotland and Northern Ireland. Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/bulletins/annualmidyearpopulationestimates/mid2020#age-structure-of-the-uk-population (accessed 27 January 2023).Google Scholar

Office for National Statistics (2023) Population Estimates by Ethnic Group, England and Wales. Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/datasets/populationestimatesbyethnicgroupenglandandwales (accessed 27 January 2023).Google Scholar

Office for National Statistics (2023) Estimates of the Population for the UK, England, Wales, Scotland and Northern Ireland. Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland (accessed 27 January 2023).Google Scholar

Little, RJA (1993) Post-stratification: A modeler’s perspective. Journal of the American Statistics Association 88(423), 1001–1012.CrossRef Google Scholar

Alonzo, TA, et al. (2002) Distribution-free ROC analysis using binary regression techniques. Biostatistics 3(3), 421–432.CrossRef Google Scholar PubMed

Murali, M, et al. (2023) Ethnic minority representation in UK COVID-19 trials: Systematic review and meta-analysis. BioMed Central Medicine 21, 111.Google Scholar PubMed

Office for National Statistics (2023) Coronavirus (COVID-19) Infection Survey, UK – Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/19february2021 (accessed 15 January 2023).Google Scholar

Weinbergerova, BE, et al. (2021) COVID-19’s natural course among ambulatory monitored outpatients. Scientific Reports 11(1), 10124.CrossRef Google Scholar PubMed

Rodriguez-Palacios, A, et al. (2020) Modeling the onset of symptoms of COVID-19. Frontiers in Public Health 8, 473.Google Scholar

Boyce, JM, et al. (2006) Effects of ageing on smell and taste. Postgraduate Medical Journal 82(966), 239–241.CrossRef Google Scholar PubMed

Generation Scotland (2023) Access Our Resources. University of Edinburgh. Available at https://www.ed.ac.uk/generation-scotland/for-researchers/access (accessed 28 October 2023).Google Scholar

Public Health England (2023) SARS-CoV-2 Variants of Concern and Variants under Investigation in England. Available at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/975742/Variants_of_Concern_VOC_Technical_Briefing_8_England.pdf (accessed 8 May 2023).Google Scholar

Willett, BJ, et al. (2022) SARS-CoV-2 omicron is an immune escape variant with an altered cell entry pathway. Nature Microbiology 7(8), 1161–1179.CrossRef Google Scholar PubMed

Menni, C, et al. (2022) Articles symptom prevalence, duration, and risk of hospital admission in individuals infected with SARS-CoV-2 during periods of omicron and delta variant dominance: A prospective observational study from the ZOE COVID study. The Lancet 399(10335), 1618–1624.CrossRef Google Scholar PubMed

Table 1. Qualifying symptoms of suspected COVID-19

Table 2. PCR and symptomatic status of all study participants; 3,320 (21.9%) of all participants had at least one symptomatic episode and 317 (2.1%) of all had a PCR+ episode

Figure 1. Age distribution in the study sample compared to that of the UK population, stratified by gender and ethnicity.

Table 3. Cohort demographic characteristics stratified by participant PCR status

Table 4. Fold-effects (risk ratios) of demographics and their 95%CIs on the mean number of days of specific symptoms reported during a symptomatic episode

Figure 6. Daily probabilities of reporting specific symptoms starting with the first report using PCR- symptomatic episodes across all participants.

Figure 8. Effect (OR) of reporting a specific symptom for 3 days during an episode, irrespective of other symptoms reported during that episode.

Table 5. Optimal model for PCR+ based on symptoms and population characteristics on a two-level weighted logistic regression analysis

Figure 9. Discrimination power of individual symptoms based on the temporally ordered reports restricted to the first 1, 2, 3 to longer than 15 days after the symptomatic illness episode starts.

Figure 10. Estimated discrimination power of each classifier. The plot and the AUC estimate follow a maximum likelihood ROC-weighted regression analysis uncontrolled for age and ethnicity.

Table 6. Effect of age and ethnicity on the ROC curve and subsequently on discrimination power associated with each classifier in the model.

Bird et al. supplementary material

File 67.1 KB

Article contents

The predictive role of symptoms in COVID-19 diagnostic models: A longitudinal insight

Abstract

Keywords

Information

Introduction

Methods

Monitoring for COVID-19

Statistical methodology

Results

Data summary

The probabilities of PCR status by specific symptom reports

The number of specific symptom analyses

The optimal diagnostic model for testing PCR+ based on symptoms and controlled for population characteristics

The discriminatory power of specific symptoms

Discussion

Limitations

Conclusion

Supplementary material

Data availability statement

Acknowledgements

Author contribution

Funding statement

Competing interest

Disclaimer

Footnotes

References

Bird et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests