Estimating R_0 and other parameters for the 2019-nCov epidemic

Summary

The epidemiology of the global 2019-nCov is poorly understood. Identifying the key processes that shape transmission and estimating the relevant model parameters is therefore an important task. This document presents arguments and analysis to support the estimation of a number of key quantities.

Key parameters investigated in this document include:

Epidemic curve
Basic reproduction number (\(R_0\))
Case detection rate (\(q\))
Incubation period (\(1/\sigma\))
Lag between symptom onset and isolation
Transmissibility (\(\beta\))
Additional parameters

Data

Key resources used for this investigation include:

A record of cumulative confirmed case reports at the national level maintained by the Drake lab at the University of Georgia.
A record of case reports at the province level maintained by the Drake lab at the University of Georgia.
A ``line list’’ maintained by Mortiz Kraemer containing case level information, including start dates for key individual events (presentation of symptoms, hospitalization, case notification, etc.)
A ``line list’’ maintained by MOBS containing case level information, including start dates for key individual events (presentation of symptoms, hospitalization, case notification, etc.)

Note: There are known discrepancies among these data sets.

Epidemic curve

To date, the majority of infections have occurred in Hubei province, as illustrated in the series of case notification over time. (Note that case notifications lag the actual number of infections by days to weeks, so a plot like this depicts a past state of the epidemic.)

Case notifications in Hubei (blue) and the rest of China.

Basic reproduction number \(R_0\)

Overview

\(R_0\), the basic reproduction number, is defined as the average number of secondary cases expected to arise from a single infected individual in a wholly susceptible population. \(R_{eff}\), the effective reproduction number, refers to the expected number of secondary cases to arise from an arbitrary case at any point in time. \(R_{eff}\) is expected to change over the course of an outbreak. Containment will occur when \(R_{eff}<1\).

Estimating \(R_0\) and \(R_{eff}\) in this outbreak are challenging because: 1. There is little information from the first few infection generations 2. The distribution of incubation period and time from presentation of symptoms to hospitalization are not exponetially diastributed 3. Interventions and policies intended to curtail the outbreak have affected the unfolding process and are therefore reflected in the case notification data.

Takeoff estimators

We have considered ‘’takeoff’’ estimators (e.g. Wearing et al. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020174). Wearing et al. show that dynamics of an epidemic in the early phases are related to \(R_0\). A regression line through plot of case notifications over time, from the start of the outbreak on 1 December 2020 (neglecting the first couple points), clearly does not go through the origin, suggesting that the epidemic was already in its exponential phase by Day 47 (Jan 16). If \(\gamma\) were constant during this period after Day 47 (which it’s probably not, due to increasing case isolation), the number of cases would be expected to grow according to the proportionality

\[\begin{equation} log(X_t) \propto (R_0-1)\gamma t. \end{equation}\]

Although \(\gamma\) is changing, we can nevertheless use this method to provide upper and lower bounds on \(R_{eff}\) by looking at the plausible range of \(\gamma\) (i.e., it’s minimum at \(\gamma \approx 1/7\) and its maxumum at \(\gamma \approx 1\).

A second approach looks at the first significant report in these data (41 cases on Day 41, Jan 10). During the pre-exponential phase of the epidemic, the takeoff rate is given by

\[\begin{equation} R_0 = \lambda_2(\lambda_2/(\sigma m)+1)^m/(\gamma (1-(\lambda_2/(\gamma n)+1)^{-n})), \end{equation}\]

where \(\lambda_2\) is the slope of the takeoff at the beginning of the epidemic, \(1/\sigma\) is the average latent period, \(1/\gamma\) is the average infectious period and \(m\) and \(n\) are the parameters of Erlang distributions for the interval (Wearing et al.).

For assumed initial case on Day \(t_1=1\) and \(x=41\) cases on Day \(t=41\), we have \(\lambda_2 \approx \frac{\log x}{t-t_1} = \frac{\log 41}{41-1} = 0.0928\). Inserting into the formula yields the estimate: \(\hat R_0 \approx 2.08\).

Estimates of \(R_0\) for China using two methods.

Case detection rate (\(q\))

We have sought to estimate the case detection rate by adjusting the case fatality rate for cases reported in Wuhan from January 1 - January 19 according to the presumed actual case fatality rate of 3% as the difference between these two estimates is primarily due to the under-reporting of less severe cases. For a derivation of our estimator, see . The following plot shows the estimated probability distribution of \(q\).

Probability distribution of case detection rate in Wuhan (January 1, 2020 - January 19, 2020).

Incubation period

Travelers who acquired the infection in a location where transmission was occuring and developed symptoms at a later time provide an opportunity to estimate the incubation period. Using data from Kraemer’s line list, we estimate the incubation period to have a mean of 5.4 days. Additionally, the distribution is estimated to be geometric (Erland with shape parameter \(k\)).

Estimated and observed distribution of incubation times.

Lag between symptom onset and isolation

A key determinant of transmission is the lag between the onset of symptoms and isolation. The inverse of this quantity is the isolation rate. By maximum likelihood, we have determined that \(k=2\) is the most likely shape parameter for an Erlang distributed onset to isolation interval.

Estimated and observed distribution of incubation times.

Case fatality rate

Here we estimate case fatality rate using the number of reported deaths (\(D\)) and cases (\(C\)) in each province. Updated data lacks information on recoveries, so these estimates are based on the raw CFR and may be considered a lower bound. The grand mean is 3.9% (confidence interval: [0.0378, 0.0405]).

The standard of care for COVID has evolved over the course of the epidemic and the case fatality rate outside Hubei is considerably lower than in Hubei. Outside Hubei, the crude case fatality rate is 0.88% (confidence interval: [0.007, 0.011]).

Transmissibility (\(\beta\))

Transmissibility (\(\beta\)) is obtained by rearranging the equation \(R_0 = \frac{\beta}{\gamma}\) to give \(\beta = R_0 \times \gamma\). Taking our estimate from the early stages of the outbreak and fixed infectious period (\(\gamma = 1/7\)) we conclude that \(\beta\) ranges from \(\beta = R_0 \times \gamma = 1.6/7 = 0.229\) to \(\beta = R_0 \times \gamma = 2.1/7 = 0.3\). For comparison, if we use the Imperial College estimate of \(R_0=2.6\), we obtain \(\beta = 2.6/ 7 = 0.371\).

Additional parameters

Population size of Wuhan: \(N \approx 11,081,000\) (Wikipedia)

Population size of Hubei: \(N \approx 59,002,000\) (Wikipedia)

Estimating \(R_0\) and other parameters for the 2019-nCov epidemic

March 13, 2020

Parameters