Hostname: page-component-5db58dd55d-8mwbx Total loading time: 0 Render date: 2026-06-17T14:51:10.569Z Has data issue: false hasContentIssue false

A simple model for the total number of SARS-CoV-2 infections on a national level

Published online by Cambridge University Press:  25 March 2021

N. Blanco
Affiliation:
Center for International Health, Education, and Biosecurity, Institute of Human Virology-University of Maryland School of Medicine, Baltimore, Maryland, USA
K. A. Stafford
Affiliation:
Center for International Health, Education, and Biosecurity, Institute of Human Virology-University of Maryland School of Medicine, Baltimore, Maryland, USA Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, USA
M. C. Lavoie
Affiliation:
Center for International Health, Education, and Biosecurity, Institute of Human Virology-University of Maryland School of Medicine, Baltimore, Maryland, USA
A. Brandenburg
Affiliation:
Nordita, KTH Royal Institute of Technology and Stockholm University, SE-10691, Stockholm, Sweden
M. W. Górna
Affiliation:
Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
M. Merski*
Affiliation:
Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
*
Author for correspondence: M. Merski, E-mail: merski@gmail.com; mgorna@chem.uw.edu.pl
Rights & Permissions [Opens in a new window]

Abstract

This study aimed to identify an appropriate simple mathematical model to fit the number of coronavirus disease 2019 (COVID-19) cases at the national level for the early portion of the pandemic, before significant public health interventions could be enacted. The total number of cases for the COVID-19 epidemic over time in 28 countries was analysed and fit to several simple rate models. The resulting model parameters were used to extrapolate projections for more recent data. While the Gompertz growth model (mean R2 = 0.998) best fit the current data, uncertainties in the eventual case limit introduced significant model errors. However, the quadratic rate model (mean R2 = 0.992) fit the current data best for 25 (89%) countries as determined by R2 values of the remaining models. Projection to the future using the simple quadratic model accurately forecast the number of future total number of cases 50% of the time up to 10 days in advance. Extrapolation to the future with the simple exponential model significantly overpredicted the total number of future cases. These results demonstrate that accurate future predictions of the case load in a given country can be made using this very simple model.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press
Figure 0

Fig. 1. Illustrative comparison of exponential, quadratic, generalised logistic and Gompertz growth curves. The Gompertz growth curve (Equation 4, solid black line) representing the progress of a theoretical epidemic for a disease with an arbitrarily chosen R0 value of 3.4 (r = 0.045, N0 = 1, NM = 70%, dotted line). The solid grey line is an equivalent logistic curve, note that while the midpoint for both logistic curves is the same, the Gompertz curve reaches the population carrying capacity more slowly, resulting in a long-tailed epidemic. The initial part of the Gompertz curve (including time points until 5% of the population has been infected) was fit to the simple exponential (red dashes), quadratic (blue dashes) and simple square (green dashes) models. It is apparent from these curves how quickly the exponential curve overestimates the rate of growth for the epidemic as compared to the quadratic and simple square fit curves and how the quadratic model more closely follows the Gompertz growth curve, evidenced by the smaller Sy.x value for the quadratic fit in Table 1.

Figure 1

Table 1. Average statistical fit parameters for the quadratic, simple square, simple exponential and Gompertz growth models for the early course of the epidemic in each country

Figure 2

Fig. 2. The development of COVID-19 cases over time in 28 nations. The total number of cases as of 1 June 2020 is indicated by black circles while the early part of the curve is indicated by orange triangles. A quadratic fit curve based on the early part of the curve extrapolated into the future is shown as an orange dashed line. The first day of each fit curve is listed for each country. The black circles are obscured in those countries which had not begun to effectively reduce SARS-CoV-2 spread by 1 June 2020. The range of possible values corresponding to the 95% confidence interval around the predicted number of cases is indicated with dashed grey lines [58].

Figure 3

Fig. 3. Binary scoring of total future predictions from the four models. (a) Median success rate for the number of days in the future for every fit data point (the orange points in Fig. 2) for predictions within the 95% CI for the fit parameters. (b) Mean normalised width of the 95% CI predictions for all 28 countries for up to 60 days in advance. (c) Mean prediction bias among all 28 countries for all four models. (d) Reward score for all 28 countries for all four prediction methods.

Figure 4

Fig. 4. Comparison of the errors in final day prospective predictions for COVID-19 case numbers for different growth models for 28 countries for the simple exponential model (red triangles), the simple square model (green squares), the quadratic model (black circles), the Gompertz growth model (blue triangles) and the basic logistic growth model (purple diamonds). The first day of each fit curve is listed for each country. Note the log scale for the vertical axis which indicates the ratio of the predicted to observed number of cases. In each graph the fit values for each model using only data up to that day are used to predict the number of expected cases for the last day for which data are available (or the last day before significant curve deviation is observed, see Fig. 2). Days on which the fit was not statistically sound for the Gompertz model were omitted from the graph.

Figure 5

Table 2. Results of prospective predictions for total case load made using the various models for final day predictions

Figure 6

Table 3. Fraction of successful predictions of the incidence of new cases of COVID-19 using the quadratic model. The incidence (number of new cases per day) was calculated by subtracting the total number of cases on the previous day

Supplementary material: File

Blanco et al. supplementary material

Blanco et al. supplementary material

Download Blanco et al. supplementary material(File)
File 166.1 KB