At any given time, most COVID-19 cases are circulating in the community and not known to us.
We wish to estimate the total current size of the COVID-19 outbreak (the total number of unnotified individuals currently infected with SARS-CoV2).
We can estimate this number from one of two sources of data:
Case reports do not include infected persons who are symptomatic but not yet diagnosed, nor infected persons with latent infections (people who cary the virus but are not yet symptomatic). In addition, a number of individuals may be infected but never develop sysmptoms. It is possible that these asymptomatic infections may still transmit the virus.
To estimate the size of the outbreak at any given time from case reports, we first estimate the number of symptomatic and latent infections in the community, accounting for cases that would be missed because they would never show symptoms. The estimates of symptomatic and latent infections are only valid some time before the present, so we forecast from that point to the present. At any time, the sum of sympotmatic and latent infections is our estimate of the total number of unnotified individuals currently infected with SARS-CoV2.
The infection fatality rate (IFR) of COVID-19 is known with a reasonable degree of precision to be around 0.9%. We reasoned that fatal cases are much less likely to be overlooked than non-fatal cases (particularly asymptomatic cases and cases with mild symtoms). Additionally, the distribution of time from onset of symptoms to death has been estimated. Taken together, these pieces of information allow the construction of a symptom onset curve in a manner analogous to that used for case notification data.
The ratio of the two different estimates for the number of persons acquiring infection on any given day is an estimate of the ascertainment rate.
The ascertainment rate tells us how many infections we don’t know about compared with those we do know about.
From case notification reports (the number of newly confirmed cases on each day), we use Monte Carlo simulation and estimated probability distributions to construct a synthetic line list of symptom onset times for all cases that are eventually notified. The average delay from symptom onset to case notification for the US is 4.4 days1. Simulated symtom onset times are distributed according to a gamma distribution with a mean of 4.4 days before case notification (shape parameter: 1.9816403).
A second simulation generates synthetic exposure dates for each infected individual. The average incubation period for the US is 7.67 days2. Individual simulated exposure times are distributed according to a gamma distribution with a mean of 7.67 days before symptom onset (shape parameter: 4.18).
Now we have a synthetic line list that generates the observed number of case notifications and includes exposure date, symptom onset date, and notification date.
From the line list, we tally the number of individuals with latent infections on each date to produce a curve of the number of latent cases in the community. We do the same to find the number of individuals symptomatic infections on each date. These curves represent only individuals with latent and symptomatic cases that were eventually reported.
Based on testing of passengers on the Princess Cruises ship in Yokohama, Japan, Mizumoto et al. estimated that the proportion of cases that are asymptomatic is 34.6% (95% CrI: 29.4%–39.8%)3. We assume asymptomatic cases are never detected. To account for these, we multiply both the number of symptomatic cases and the number of latent cases by \(\frac{1}{34.6\%}\).
Since the symptom onsets occur on average 2.67 days before reporting, we are only sure to have an accurate estimate for the number of symptomatic individuals at some time prior to 2.67 days befor the present. We therefore discard estimates for the last several days (2.67 days plus a margin of 2 standard deviations), and use statistical forecasting to extrapolate from the observed trajectory of symptomatic cases to the present time. This is the “nowcast” of symptomatic individuals.
We repeat the process for latent cases, discarding values for times within 7.67 days (plus a margin) of the last good estimate of symptomatic individuals, and forecasting to the present. The is the “nowcast” of latent individuals.
The sum of the latent and symptomatic cases at any time provides an estimate of toatal number of cases (including asymptomatic and unnotified) cases.
From fatality reports (the number of newly reported deaths on each day), we use Monte Carlo simulation and estimated probability distributions to construct a synthetic line list of symptom onset times for all cases that are eventually notified. The average delay from symptom onset to death, estimated from data in the China outbreak outside Hubei, is 19.9 days4. Individual simulated symptom onset times are distributed according to a log normal distribution with a mean of 19.9 (standard deviation = 11.4).
Based on data from China, and correcting for demographics in Great Britain, Ferguson et al. estimated an Infection Fatality Rat (IFR) of 0.9% (95% CrI: 0.4%–1.4%)5. To account for IFR, we multiply the fatalities by \(\frac{1}{0.9\%}\) before simulating the line list of symptom onset times.
The resulting estimates of latent and infectious individuals are intended to represent all unnotified cases.
US Case Reports by State
US data are maintained by CEID and available .
Data are compiled from https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_the_United_States, which is updated by anonymous contributors.
We aim to update our dataset daily.
Fatalities by Country
Country level fatality data are maintained by CEID and available on the tab called “Cumulative fatalities reported by country.”
Data are compiled from various sources, listed in the dataset.
We aim to update our dataset daily.
Analysis by the Center for the Ecology of Infectious Diseases of US data. See supplemental information.↩
Analysis by the Center for the Ecology of Infectious Diseases of US data suggests a mean incubation period of 7.67 days. See supplemental information.↩
Kenji Mizumoto, et al. Estimating the Asymptomatic Ratio of 2019 Novel Coronavirus onboard the Princess Cruises Ship, 2020. medRxiv preprint. https://doi.org/10.1101/2020.02.20.20025866↩
Jung-mok Jung, et al. Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases. February 14, 2020, Journal of Clinical Medicine. https://doi.org/10.3390/jcm9020523↩
Neil M Ferguson, et al. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. March 16, 2020, Imperial College COVID-19 Response Team. https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf↩