Predicting respiratory failure for COVID-19 patients in Japan: a simple clinical score for evaluating the need for hospitalisation

Predicting the need for hospitalisation of patients with coronavirus disease 2019 (COVID-19) is important for preventing healthcare disruptions. This observational study aimed to use the COVID-19 Registry Japan (COVIREGI-JP) to develop a simple scoring system to predict respiratory failure due to COVID-19 using only underlying diseases and symptoms. A total of 6873 patients with COVID-19 admitted to Japanese medical institutions between 1 June 2020 and 2 December 2020 were included and divided into derivation and validation cohorts according to the date of admission. We used multivariable logistic regression analysis to create a simple risk score model, with respiratory failure as the outcome for young (18–39 years), middle-aged (40–64 years) and older (≥65 years) groups, using sex, age, body mass index, medical history and symptoms. The models selected for each age group were quite different. Areas under the receiver operating characteristic curves for the simple risk score model were 0.87, 0.79 and 0.80 for young, middle-aged and elderly derivation cohorts, and 0.81, 0.80 and 0.67 in the validation cohorts. Calibration of the model was good. The simple scoring system may be useful in the appropriate allocation of medical resources during the COVID-19 pandemic.


Introduction
Coronavirus disease 2019 (COVID- 19) shows a variety of clinical presentations. Most patients are asymptomatic or mildly ill at the time of diagnosis and then recover, but some require oxygen therapy, and more severe cases may require mechanical ventilation or may even die [1][2][3].
In Japan, COVID-19 patients were hospitalised in principle at the beginning of the pandemic [4]. However, as the number of patients requiring hospitalisation increased due to the spread of the infection and securing hospital beds for patients with severe disease became difficult, patients with mild disease were placed in facilities for recuperating (hotels provided by the government, etc.) or sent home. Even in mild cases, individuals who fell into one of the following categories were recommended for hospitalisation until at least 10 days after disease onset: (i) elderly individuals (≥65 years old); (ii) individuals with underlying diseases; (iii) individuals in an immunosuppressed state; (iv) individuals who were pregnant; and (v) individuals judged by doctors to require hospitalisation [5]. These hospitalisations have been coordinated by regional healthcare systems. Most hospitalised patients did not require oxygen therapy during the course of the disease [6], and many of these patients were discharged from the hospital after observation alone.
As of 25 March 2021, a cumulative total of 458 539 infections and 8936 deaths had been confirmed in Japan [7], and the number of patients has increased explosively since the second wave began in June 2020 [8]. In January 2021, securing inpatient beds became particularly difficult. In Tokyo, as the city with the largest number of patients, COVID-19 inpatients took up 84% of available beds (as of 13 January 2021) [9], and the number of patients requiring admission to an intensive care unit (ICU) was greater than the number of available ICU beds (113% as of 27 January 2021) [10].
From the perspective of efficient allocation of medical resources, in situations where the number of patients increases rapidly to the extent that securing inpatient beds is difficult, hospitalisation of patients with high oxygen demands (i.e. patients with a high possibility of developing respiratory failure) must be prioritised. In addition, in situations where patients are placed in recuperating facilities or sent home, and where detailed vital signs, physical findings and blood sampling data are unavailable, having an index for follow-up that focuses on patients with the highest possibility of developing respiratory failure is useful.
Most prognostic models for COVID-19 developed to date have focused on predicting death or ICU admission, and have required vital signs, blood sampling data and imaging studies [11]. These prediction models have been developed with small sample sizes, and relatively few models have predicted the need for oxygen therapy using large databases. This study aimed to develop and validate a simple clinical risk score to predict the need for oxygen therapy using only patient demographic characteristics, comorbidities and symptoms, based on a large registry of patients hospitalised for COVID-19 throughout Japan.

Methods
This study is reported in accordance with Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) recommendations [12].

Study population
This study was an observational study using the COVID-19 Registry Japan (COVIREGI-JP), a nationwide registry of patients hospitalised for COVID-19 [13]. Study data were collected and managed using REDCap (Research Electronic Data Capture), a secure, web-based data capture application hosted at the Joint Center for Researchers, Associates and Clinicians (JCRAC) data centre of the National Center for Global Health and Medicine (NCGM) [14]. Inclusion criteria for this study were as follows: (1) positive results for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from polymerase chain reaction (PCR) testing; and (2) hospitalisation in a Japanese medical institution between 1 June 2020 and 2 December 2020. Exclusion criteria were as follows: (1) >14 days from symptom onset to hospitalisation; (2) already hospitalised at the time of COVID-19 diagnosis; (3) transferred from the reporting hospital to another hospital; (4) age <18 years; (5) pregnant; (6) receiving haemodialysis; or (7) missing values for outcome. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Review Committee at the NCGM (approval number: NCGM-G-003494-0). Information regarding opting out of this study is available on the registry website.

Outcome and predictor variables
Primary outcome was the presence or absence of oxygen therapy (by any method) during hospitalisation. Predictor variables were selected as clinically important factors or those identified as risk factors for severe disease in previous studies [15][16][17][18][19][20][21][22], and data at the time of admission were used (Supplementary Table S1). The 19 candidate factors were demographic characteristics (age, sex, body mass index (BMI)), presence of symptoms (fever, cough, shortness of breath (SOB), wheezing, or fatigue) and presence of comorbidities (hypertension, hyperlipidaemia, diabetes mellitus, congestive heart failure, cerebrovascular disease, bronchial asthma, chronic lung disease other than bronchial asthma, chronic liver disease, moderate to severe renal dysfunction, collagen disease or malignant disease). According to the classification of BMI for Asians proposed by the World Health Organization [23,24], BMI was classified into five categories: <18.5, 18.5-22.9, 23.0-24.9, 25.0-29.9 or ≥30 kg/m 2 .

Statistical analysis
The study population was divided into two cohorts according to the date of admission: derivation cohort (from 1 June 2020 to 10 September 2020) and validation cohort (from 11 September 2020 to 2 December 2020) [12,25]. Summary statistics of baseline characteristics at admission for each cohort were calculated and expressed appropriately as mean (standard deviation), median (interquartile range) or count (percentage), as appropriate.

Model development
According to previous studies, the proportion of young patients who required respiratory support was much lower than that of  elderly patients [6], and some risk factors for severe disease were reported to be highly heterogeneous according to age [26]. In this study, we divided the study population into three age groups: young, 18-39 years old; middle-aged, 40-64 years old; and elderly, ≥65 years old [27,28]. We then created a model for each age group. The study population was further categorised into the following age strata: 18-29, 30-39, 40-49, 50-59, 60-64, 65-74 and ≥75 years old. All 19 predictor variables were entered into a logistic regression model with the presence or absence of oxygen administration during hospitalisation as the outcome. Variables were selected by the backward elimination method, and regression coefficients of each variable were used to create a comprehensive model [25,29]. In addition, a simple risk score model was created by dividing all regression coefficients by the smallest of the binary predictor regression coefficients in the comprehensive model and converting values to the nearest integer [25].

Evaluation of performance
To evaluate the performance of the simple risk score model, areas under the receiver operating characteristic curves (AUCs) for discrimination were calculated separately for the derivation cohort and validation cohort. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) at various cutoff points were also calculated. Finally, to evaluate agreement between the predicted and observed probabilities, we created a calibration plot with the predicted probability on the x-axis and  the observed probability on the y-axis using a comprehensive model.

Missing data handling and sensitivity analysis
Among the collected data on symptoms, those with entries of 'unknown' were treated as missing values, and missing data for symptoms were treated as 'no symptoms'. For the 17.1% of patients with missing BMI data, the missing data were complemented using single imputation methods as follows. In primary analysis, missing BMI data were complemented by one of the five categories using multivariable logistic regression with the following predictors: age, sex, symptoms (fever, cough, SOB, Epidemiology and Infection wheezing and fatigue) and presence of comorbidities (hypertension, hyperlipidaemia, diabetes mellitus, congestive heart failure, cerebrovascular disease, bronchial asthma, chronic lung disease other than bronchial asthma, chronic liver disease, moderate to severe renal dysfunction, collagen disease and malignant disease). In sensitivity analyses, for patients with physician-diagnosed obesity according to registry data, BMI data were complemented by either one of two categories, 25.0-29.9 or ≥30 kg/m 2 . For patients without physiciandiagnosed obesity, data were complemented by one of the five categories according to the distribution of observed BMIs.
Since this was an observational study, all available data were used in the analysis. All P-values were two-tailed, and the statistical significance level was set at the level of P < 0.05. All analyses were performed using SAS version 9.4 software (SAS Institute, Cary, NC, USA).

Background characteristics of patients on admission
A total of 9271 patients tested positive for SARS-CoV-2 PCR during the study period, and 6873 patients were included in the final analysis, excluding those who met the exclusion criteria (Fig. 1). For the 4513 patients in the derivation cohort, median time from onset to admission was 4 days, 96.2% were Japanese, 58.8% were male, mean age was 46.9 years (S.D. = 20.1) and 20.7% required oxygen therapy (Table 1).

Model development
Odds ratios (ORs), 95% confidence intervals and P-values for each predictor as calculated by multivariable logistic regression analysis for each age group, as well as the comprehensive model and the simple risk score model calculated from it, are shown in Tables  2-4 and Supplementary Table S2a-c. Variables selected for each age group differed, with seven variables (sex, age, BMI, malignancy, fever, SOB and wheezing) in the young age group (18-39 years) and eight variables (sex, age, BMI, diabetes, fever, cough, SOB and fatigue) in the middle-aged group. In the elderly, nine variables (age, BMI, congestive heart failure, cerebrovascular disease, diabetes, hypertension, fever, cough and SOB) were selected.

Model validation
In the derivation cohort, AUCs of the simple risk score models were 0.87, 0.79 and 0.80 in the young, middle-aged and elderly   3. Sensitivity and specificity of the models for each age group at various cut-offs to detect respiratory failure. 6 Gen Yamada et al.
groups, respectively. In the validation cohort, AUCs from simple risk score models were 0.81, 0.80 and 0.67 in the young, middle-aged and elderly groups, respectively (Fig. 2). Figure 3 and Supplementary Table S3 show the sensitivity, specificity, PPV and NPV of models for each age group at various cut-offs. Figure 4 shows calibration plots for the derivation cohort and validation cohort. In the sensitivity analysis, the models obtained by the analysis of imputed data, when missing BMI data were complemented according to the distribution of observed BMIs, were not significantly different from those obtained by primary analysis when missing BMI data were imputed using the regression method (Supplementary Table S4 and Fig. S1).

Discussion
This study developed a simple score to predict the need for oxygen therapy among COVID-19 patients using age, sex, BMI, underlying diseases, and symptoms for young, middle-aged and elderly age groups. We found that although BMI and SOB were factors with relatively high impact in all age groups, the models selected for each age group were quite different. Although many predictive models for COVID-19 have been developed [11], the novelty of this study lies in the development of a model that predicts oxygen demand by age group using only the underlying diseases of the patient and symptom data from a large dataset.
Among the prediction models developed in this study, BMI, diabetes mellitus, hypertension, congestive heart failure, cerebrovascular disease, malignancy, fever and cough have been selected in previous studies [16,[30][31][32][33][34] and were consistent with previous results. PCR testing has become widely available, and COVID-19 has been diagnosed in general clinics and testing centres. In addition, telemedicine and telephone follow-up have increased since the start of the COVID-19 pandemic [35]. In this situation, using information such as blood sampling data, physical examinations and oxygen saturation is difficult. Therefore, the models in this study appear useful to predict patients likely to experience respiratory failure in the future and to support the decision-making of healthcare providers. COVID-19 patients have been pointed out to be potentially unaware of dyspnoea even if they are hypoxic [36,37]. The present models may also be clinically applicable to patients with mild or asymptomatic disease who are observed at home or in an institution to closely monitor high-risk patients. In addition, under situations such as a pandemic, where the workload on healthcare workers and health centre staff is high, a simple model that can be calculated using only a pen and paper is more convenient than a complicated predictive model that requires an online computer.
The use of this score in the field might vary depending on the local COVID-19 caseload. For example, it can be used as a criterion to determine the indication for hospitalisation in situations where many patients require oxygenation, or when the number of cases is low, it can be used to select the appropriate healthcare-related facilities for patients with differing levels of disease severity.
We propose that the cut-off value for the simple score model of this study should be ≥6 for the model of young people (18-39 years old), ≥5 for the model of middle-aged people (40-64 years old) and ≥3 for the model of elderly people (≥65 years old). The reasons are as follows. In the younger population, fewer patients

Epidemiology and Infection
progress to respiratory failure, so lowering the cut-off will lower the PPV and increase unnecessary hospitalisations. In the younger population, where remaining physical capacity is usually much greater, selection of high-risk patients for inpatient care or intensive follow-up is preferable. In the middle-aged population, where about one-third of patients develop respiratory failure, a cut-off of ≥5 provides 85% sensitivity, and nearly half of the hospitalised patients who do not require oxygen can be classified as low-risk. In the elderly population, about half develop respiratory failure during hospitalisation. For the elderly, it is important not to overlook those who would suffer from respiratory failure, and the cutoff should thus be set as low as possible. A cut-off value of ≥3 provides 97% sensitivity, a PPV of 56% and an NPV of 90%, representing a reasonable criterion for making decisions about hospitalisation for this disease with a high mortality rate.

Limitations
Various limitations to this investigation need to be considered when interpreting the results. This study was analysed using registry data from COVID-19 patients hospitalised in Japan. The population used to develop the model is thus expected to be sicker than the overall population of COVID-19 patients, including those who have not been hospitalised. However, in Japan, COVID-19 patients ≥65 years old and patients <65 years old with underlying diseases were recommended to be hospitalised in principle. This study cohort may thus offer a relatively good reflection of the COVID-19 population. In the younger age group (18-39 years), the effects of predictor variables may not have been properly assessed because of the small number of patients in whom the endpoint occurred. When using this prediction model in clinical practice, it is important to note that the endpoint used in this study was the presence or absence of oxygen therapy. Patients may require hospitalisation for supportive care for medical conditions other than respiratory failure, such as dehydration or poor food intake. Therefore, even if a patient gives a negative result under the prediction model described in this study, follow-up of the patient based on the clinical situation is warranted. The performance of the prediction model may change in the future with the extensive use of vaccines and the spread of viral mutations. Since most patients analysed were ethnically Japanese, the validity of the model in other populations needs to be verified. It should be noted that some patients who are considered to have no underlying medical conditions may actually have a high risk, as they may not be aware of their own health status or may not have visited a medical institution for a long period of time. Finally, the performance of the model for elderly people (≥65 years old) decreased when applied to the validation cohort. A possible reason for this may be that we have a lot of cluster cases at nursing homes included in the validation cohort, so that the characteristics of elderly people in the validation cohort may have been much different from the characteristics of those in the derivation cohort [38]. Therefore, further studies to update the model are needed.

Conclusion
Using data from the large COVID-19 hospitalisation registry in Japan, we developed a simple predictive score model to predict respiratory failure based solely on the underlying diseases and symptoms of patients. A simple risk score based on readily available information is highly versatile and may be useful during pandemics.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821001837.