In the current coronavirus disease (COVID-19) pandemic, the first assessment of patients presenting with respiratory symptoms is virtually always performed by telephone (Greenhalgh et al., Reference Greenhalgh, Koh and Car2020a, Greenhalgh et al., Reference Greenhalgh, Wherton, Shaw and Morrison2020b, Smith et al., Reference Smith, Thomas, Snoswell, Haydon, Mehrotra, Clemensen and Caffery2020). Furthermore, monitoring of COVID-19 is also primarily done remotely, unless progression of symptoms warrants in-person evaluation (Greenhalgh et al., Reference Greenhalgh, Wherton, Shaw and Morrison2020b, Greenhalgh et al., Reference Greenhalgh, Koh and Car2020a, Smith et al., Reference Smith, Thomas, Snoswell, Haydon, Mehrotra, Clemensen and Caffery2020, Huang et al., Reference Huang, Wang, Li, Ren, Zhao, Hu, Zhang, Fan, Xu, Gu, Cheng, Yu, Xia, Wei, Wu, Xie, Yin, Li, Liu, Xiao, Gao, Guo, Xie, Wang, Jiang, Gao, Jin, Wang and Cao2020). In order to assess whether COVID-19 deteriorates, telephone assessment focusses on deteriorating respiratory function (Greenhalgh et al., Reference Greenhalgh, Koh and Car2020a, The Centre for Evidence-Based Medicine, 2020b, Huang et al., Reference Huang, Wang, Li, Ren, Zhao, Hu, Zhang, Fan, Xu, Gu, Cheng, Yu, Xia, Wei, Wu, Xie, Yin, Li, Liu, Xiao, Gao, Guo, Xie, Wang, Jiang, Gao, Jin, Wang and Cao2020, The Centre for Evidence-Based Medicine, 2020a). Unfortunately, an accurate assessment of the patient’s respiratory status can be challenging during telephone triage, as patients with COVID-19 may be hypoxemic without presenting with typical warning signs, such as dyspnoea (O’Driscoll et al., Reference O’Driscoll, Howard, Earis and Mak2017, Huang et al., Reference Huang, Wang, Li, Ren, Zhao, Hu, Zhang, Fan, Xu, Gu, Cheng, Yu, Xia, Wei, Wu, Xie, Yin, Li, Liu, Xiao, Gao, Guo, Xie, Wang, Jiang, Gao, Jin, Wang and Cao2020, Greenhalgh et al., Reference Greenhalgh, Koh and Car2020a, The Centre for Evidence-Based Medicine, 2020a, The Centre for Evidence-Based Medicine, 2020b, Ottestad et al., Reference Ottestad, Seim and Maehlen2020, Teo, Reference Teo2020, Chen et al., Reference Chen, Wu, Chen, Yan, Yang, Chen, Ma, Xu, Yu, Wang, Wang, Guo, Chen, Ding, Zhang, Huang, Han, Li, Luo, Zhao and Ning2020, Xie et al., Reference Xie, Covassin, Fan, Singh, Gao, Li, Kara and Somers2020). Early on in the pandemic, a simple breathing test was therefore promoted through medically oriented social media channels to assist in the detection of hypoxaemia during telephone triage (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016, The Centre for Evidence-Based Medicine, 2020b). This breathing test, named as the ‘Roth score’, was developed in an in-hospital population of patients with cardiopulmonary pathology and was based on a patient’s ability to count up to 30 in a single exhalation (counting number), as well as the time taken to reach a maximal count in seconds (counting time) (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016). This score showed a strong positive correlation with pulse oximetry and good discrimination for detecting hypoxaemia. However, while promising, the Roth score was never externally validated, and certainly not in a community-based setting (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016, The Centre for Evidence-Based Medicine, 2020b, The Centre for Evidence-Based Medicine, 2020a). Providing primary care physicians and telephone triagists with a reliable score to be used in the remote assessment of possible hypoxemic patients could be crucial in the ongoing COVID-19 pandemic. The purpose of the current study was, therefore, to determine the diagnostic accuracy of the Roth score as a tool assisting in the diagnosis of hypoxaemia, compared with pulse oximetry as the reference standard, in suspected COVID-19 patients in general practice.
We reported the methods and findings of this study confirm to the Standards for Reporting of Diagnostic Accuracy Studies (STARD 2015) (Bossuyt et al., Reference Bossuyt, Reitsma, Bruns, Gatsonis, Glasziou, Irwig, Lijmer, Moher, Rennie, De Vet, Kressel, Rifai, Golub, Altman, Hooft, Korevaar, Cohen and Group2015).
We performed a cross-sectional, clinical validation study in which we asked general practitioners (GPs), situated in primary care practices throughout the Netherlands, to evaluate patients for eligibility and enrol patients during consultation. GPs were approached to participate both structurally, by coordinating healthcare organisations, and personally, through word-of-mouth referral and social network channels. As such, recruitment of physicians occurred via virtual snowball sampling. For each patient, the index test, reference standard and data collection were performed in a single consultation. Given the cross-sectional nature of this study, no follow-up data were collected.
Eligible patients were at least 18 years of age who were presented with symptoms suggestive for or caused by a confirmed severe acute respiratory syndrome coronavirus (SARS-CoV-2) infection, and was able to perform the Roth test.
We estimated the prevalence of patients with simultaneous pulse oximetry (SpO2) < 95% to be approximately 15% in general practice, which requires a minimal sample of 155 subjects to achieve a minimum power of 80% to detect a change in the percentage value of sensitivity from 0.70 to 0.90, based on a target significance level of 0.05. This sample size is also sufficient to detect a change specificity value from 0.70 to 0.90, which requires a sample of 39 subjects (Bujang and Adnan, Reference Bujang and Adnan2016).
The Roth score (index test)
Patients were instructed to take a deep breath and subsequently count out loud, as fast as possible, from 1 to 30, during a single exhalation. The Roth score includes two measurements: (a) the counting time as a measure for the duration of time in seconds between counting from 1 to 30 in one breath, or until the next inhalation and (b) the counting number as a measure for the highest number counted in one breath (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016). In the derivation study, a maximal counting number <15 and a counting time <8 s were found to be optimal for identifying patients with a room air SpO2 < 95% (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016).
Pulse oximetry (reference standard) and definition of hypoxaemia
Peripheral oxygen saturation (SpO2) on room air, measured by validated pulse oximeters, was used as the reference standard for the detection of hypoxaemia in this study. This measurement is non-invasive, easily executable in general practice and provides a reflection of the true arterial oxygen saturation (SaO2) as determined by arterial blood gas (ABG) analysis, which is the golden standard for detecting hypoxaemia (O’Driscoll et al., Reference O’Driscoll, Howard, Earis and Mak2017).
Since there is no exact threshold of oxygen saturation below which a patient becomes hypoxemic, the threshold varies between a SpO2 of 90% and 95% amongst studies (Kelly et al., Reference Kelly, Mcalpine and Kyle2001, O’Driscoll et al., Reference O’Driscoll, Howard, Earis and Mak2017, Greenhalgh et al., Reference Greenhalgh, Koh and Car2020a). We focused on a SpO2 of 95% as the primary cut-off value, as saturation levels below this threshold are considered high risk for respiratory deterioration in patients without pre-existing pulmonary disease. Second, we also determined the diagnostic accuracy for identifying SpO2 < 90%, in accordance with the derivation study (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016).
GPs entered data by using an electronic case report form (eCRF) that was accessible through a dedicated website (https://www.rothscore.nl). Instructions regarding the execution of the test were provided both textually and by video demonstration on the website. The eCRF consisted of index and reference standard measurements, demographic data, clinical manifestations, vital parameters, underlying comorbidities, the use of immunosuppressant therapy, smoking status and COVID-19 exposure history. The likelihood of SARS-CoV-2 infection was left at the discretion of the participating clinician, either based on a SARS-CoV-2 polymerase chain reaction (PCR) test or based on clinical suspicion.
Outcomes of interest
We studied the correlation between the Roth score, consisting of counting time and counting number, and SpO2 on room air, with pulse oximetry as the reference standard. Subsequently, we assessed the diagnostic accuracy measures of discrimination (c-statistic), sensitivity, specificity, the positive and negative predictive values of the Roth score as an instrument for detecting hypoxaemia (SpO2 < 95% and <90%).
The baseline characteristics are described as proportions, means or medians with corresponding dispersion measures. We displayed the correlations between counting number, counting time and SpO2 on room air in scatter plots. We computed correlation coefficients with corresponding 95% confidence intervals (CIs) by using bias-corrected and accelerated (BCa) bootstrapping. Correlations of 0.1 were considered weak, 0.4 moderate and 0.7 strong (Akoglu, Reference Akoglu2018). We constructed a Receiver Operating Characteristic (ROC) curve and computed the c-statistic with corresponding 95% CI to present the discriminatory ability of the index test. The Roth score’s diagnostic accuracy for detecting SpO2 < 95% and <90% was determined by calculating the sensitivity (SENS), specificity (SPEC), positive predictive value (PPV) and negative predictive value (NPV) for the cut-off values of counting number and counting time with 95% CIs. Statistical analyses were performed by using IBM SPSS 26.0 and MedCalc 2020.
Patient flow and baseline characteristics
A total of 109 patients were enrolled in this study by 33 independent GPs between 4 April 2020 and 19 June 2020. Of those, four patients did not meet the eligibility criteria and were excluded from data analysis (Figure 1). The final study population consisted of 105 individuals (52.4% female, mean age of 52.6 ± 20.4 years), from whom the baseline characteristics are shown in Table 1. The majority of patients (n = 91, 90.1%) presented in GP offices, while 10 patients (9.9%) presented in out-of-hours GP urgent care centres. Of all patients, 11 (10.5%) were known PCR positive for SARS-CoV-2 at the time of presentation, whereas the diagnosis was considered likely by the assessing physician in 53 (50.5%) patients. The predominant presenting symptoms were coughing (61%), dyspnoea (58.1%) and exhaustion (56.2%). The most frequent occurring comorbidities were hypertension (31.4%), pulmonary disease (23.8%) and diabetes mellitus (10.5%). The median oxygen saturation at presentation was 98.0% [interquartile range (IQR) 96.5–98.5], heart rate 84.0 bpm (IQR 72.8–97.0), respiratory rate 16.0 breaths/min (IQR 14.0–18.0), body temperature 37.1°C (IQR 36.7–37.7). Of all included patients, a total of 15 individuals (14.3%) had a SpO2 < 95%, of which 4 individuals (3.8%) <90% (Table 2). Supplementary Table 1 shows the distribution of patients over the counting number and counting time categories.
SD = standard deviation; 25th = first quartile; 75th = third quartile; bpm = beats per minute; min = minute; °C = degrees Celsius; COPD = Chronic Obstructive Pulmonary Disease; COVID-19 = coronavirus disease 2019; GP = General Practitioner.
SpO2 = peripheral oxygen saturation; N = number; *, of which 4 < 90%; 25th = first quartile; 75th = third quartile; SD = standard deviation.
Correlation between the Roth score and pulse oximetry
Figure 2 shows the correlations between counting number, counting time and SpO2 on room air. The correlation analysis showed a moderately positive, linear correlation between counting number and pulse oximetry, and a weak positive, linear correlation between counting time and pulse oximetry.
Diagnostic accuracy of the Roth score
As shown in Table 2, patients with a SpO2 ≥ 95% had higher median counting number and mean counting time, compared to those with lower SpO2 values. Figure 3 shows the discrimination plots of hypoxaemia for counting number (c-statistic: 0.91) and counting time (c-statistic: 0.77), respectively. Diagnostic accuracy in terms of SENS and SPEC are shown in Table 3 and Supplementary Table 2. Of all tested cut-off values, optimal accuracy was found for a counting number of 20 (SENS 93.3%, SPEC 77.8%, PPV 41.2%, NPV 98.6%) and a counting time of 7 s (SENS 85.7%, SPEC 81.1%, PPV 41.4%, NPV 97.3%).
SENS = sensitivity; SPEC = specificity; PPV = positive predictive value; NPV = negative predictive value; max = maximum; s = seconds; CN = counting number; CT = counting time.
In this study, we assessed the diagnostic accuracy of the Roth score as an instrument for assisting in the detection of hypoxaemia in patients with SARS-CoV-2 symptoms in a primary care-based setting during the early phase of the COVID-19 pandemic. Our study showed good discriminatory ability of counting number and counting time for identifying hypoxaemia (SpO2 < 95%). The optimal cut-off value for identifying a room air oxygen saturation <95% in terms of sensitivity, specificity and predictive values was found for a maximal counting number of 20 and a counting time of 7 s.
Strengths and limitations
Our study provides important data on the validity of the Roth score in identifying patients with hypoxaemia in general practice. This is a relevant subject in the COVID-19 pandemic in which patients are more frequently assessed remotely. The use of social media channels for physician recruitment largely contributed to the expanse of the sample size and geographical scope of this study (Baltar and Brunet, Reference Baltar and Brunet2012). Another strength lies in the simplicity of the number of items we asked physicians to record, resulting in nearly all patients receiving both the index test and the reference standard. Each patient received the same reference standard, directly after the index test, herby avoiding partial verification bias and condition progression bias.
Several limitations should be mentioned. During the inclusion phase of our study, the number of new COVID-19 cases decreased substantially (at the end of the first wave), which meant we could enrol 105 of the projected 155 patients. Consequentially, we were able to include a smaller number of patients with hypoxaemia, resulting in reduced precision of the diagnostic accuracy estimates of the Roth score.
Second, the use of snowball sampling for the recruitment of physicians may have introduced selection bias.
Third, we included a relatively young patient sample compared to age categories in which the highest COVID-19 morbidity and mortality are found potentially affecting our diagnostic accuracy results.
Fourth, we used pulse oximetry as the reference standard for hypoxaemia in this study. Although pulse oximeters reflect the SaO2 accurately at saturations above 88%, they are less reliable at lower oxygen saturation, potentially introducing misclassification bias (O’Driscoll et al., Reference O’Driscoll, Howard, Earis and Mak2017). Fifth, for the execution of both the index test and the reference standard, standardised instructions were provided on the website. However, given a large number of test performers, it is possible that small differences exist in the execution of both tests, possibly affecting estimates of diagnostic accuracy. Finally, the possible added value of the Roth score lies in telephone triage situations where pulse oximeter readings are unavailable. Our study, however, did not test the Roth score in a telephone triage setting, but instead during physical assessment.
To our knowledge, the derivation study of the Roth score, led by Chorin et al, is the only prior study assessing the validity of the Roth score as a tool for detecting hypoxaemia (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016). The participants of that study were markedly different from ours, as these involved patients admitted to a coronary care unit due to predominantly cardiac morbidity. Inherent to the clinical setting, they were able to assess a larger sample of patients with hypoxaemia; 65 patients and 22 patients with oxygen saturations <95% and <90%, respectively. Compared to our study, their findings showed stronger positive correlations between counting number and SpO2 (r = 0.67; P < 0.001), as well as between counting time and SpO2 (r = 0.59; P < 0.001). With a counting number’s AUC of 0.83 and a counting time’s AUC of 0.76 for identifying SpO2 < 95%, their findings showed comparable overall test performance of the Roth score (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016). However, further evaluation of the diagnostic accuracy results showed a completely opposite gradient of sensitivity and specificity for the same cut-off values of counting number and counting time, resulting in different optimal cut-off values for daily clinical practice. They found a maximal counting number <15 and counting time <8 s to be associated with sensitivities of 83% and 78% and specificities of 71% and 71% for identifying hypoxaemia, respectively, (Chorin et al., Reference Chorin, Padegimas, Havakuk, Birati, Shacham, Milman, Topaz, Flint, Keren and Rogowski2016). Unfortunately, we were not able to deduce the cause of these significant differences from their study design.
Resulting from the growing need for a tool that signals hypoxaemia remotely during the pandemic, the Roth score was advertised as an easily accessible telephone triage tool on multiple internet websites and was quickly incorporated into clinical guidelines (The Centre for Evidence-Based Medicine, 2020b). However, after GPs applied it inappropriately with negative consequences, the use of the score was dissuaded by the Oxford COVID-19 Evidence Service Team and the Royal College of General Practitioners (The Centre for Evidence-Based Medicine, 2020b, The Royal College of General Practitioners, 2020). They recommended an overall clinical assessment instead of using the Roth score for the remote assessment of hypoxaemia, due to its high false-positive and false-negative rate (The Centre for Evidence-Based Medicine, 2020a, The Centre for Evidence-Based Medicine, 2020b). Even though we believe that using solely the optimal cut-off value might diminish these errors and subsequent unwarranted policymaking, we would like to emphasise the importance of using the Roth score as part of an overall clinical assessment. The Roth score is not meant to substitute a holistic clinical assessment, but rather provides an additional tool that can contribute to the decision to direct a patient towards hospital admission.
Implications for clinical practice and future directions
Current Dutch remote triage protocols in general practice focus on (alarm) symptoms, signs of haemodynamic instability and risk factors for complicated disease (KS, 2020). We regard the Roth score as an easily executable, low-resource and inexpensive instrument, which is potentially applicable to current telemedicine practice. Based on our results, a counting number with a cut-off value of 20 might be of additional value to signalling hypoxaemia remotely, although it likely will not provide sufficient accuracy to use as a replacement of an overall clinical assessment. Considering its sensitivity, we would advise in-person assessment for a counting number below 21 to rule out hypoxaemia, and provided that further clinical assessment does not give rise to in-person evaluation, conservative management if a patient counts higher than 20 in one breath.
We believe caution is warranted with using counting time in clinical practice since we found no significant correlation with pulse oximetry, and this measurement is subject to more external variables than counting number, subsequently affecting the clinical relevance of its result. Moreover, the measurement of solely counting number is more user-friendly and time-saving in clinical practice, where consultation time is limited and irrelevant proceedings are undesirable.
Finally, we believe that our study provides the groundwork for future studies to validate the Roth score in general practice. These studies should concentrate on external validation with larger sample sizes, older age categories and changes in the Roth score due to relevant cardiopulmonary comorbidities. Moreover, future studies should investigate the feasibility of the Roth score in a telemedicine setting, inter and intra user reproducibility and user-friendliness of the test. These studies are required before recommending the integration of the Roth score (i.e., counting number measurement) in triage protocols.
The Roth score’s counting number, with an in-person assessment cut-off value of 20, is potentially of added value for signalling hypoxaemia remotely in patients with possible COVID-19 in general practice. We consider this measurement easily executable in general practice and potentially applicable to telemedicine practice. However, before recommending integration in the current triage protocols, external validation studies with larger sample sizes are warranted. We advise caution with using counting time in clinical practice, as we found no additional value of this measurement.
We would like to thank all participating physicians and patients for their cooperation in performing this study.
RH and CB designed the study protocol. CB, RH and JC were involved in the recruitment of GPs for participation in data collection. CB performed the statistical analysis and interpreted all data with statistical guidance from JH and RH. CB drafted the manuscript and all authors contributed to its revision. All authors read and approved the final manuscript.
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflicts of interest
A waiver was granted by the Medical Ethics Review Committee of the Amsterdam University Medical Centre – Academic Medical Centre.
Patient consent for publication
Not required, considering the cross-sectional nature of this study with the use of solely anonymous clinical data.
Provenance and peer review
Not commissioned; externally peer reviewed.
All data relevant to the study are included in the article or uploaded as an online supplement. The datasets used during the current study are available from the corresponding author on reasonable request.
To view supplementary material for this article, please visit https://doi.org/10.1017/S1463423621000347.