Adjust a univariate time series of counts for observed but-not-yet-reported events

Nowcasting can help to obtain up-to-date information on trends during a situation where reports about events arrive with delay. For example in public health reporting, reports about important indicators (such as occurrence of cases) are prone to be delayed due to for example manual quality checking and reporting system hierarchies. Altogether, the delays are subject to a delay distribution, which may, or may not, vary over time.

Usage

nowcast(now, when, data, dEventCol="dHospital", dReportCol="dReport",
        method=c("bayes.notrunc", "bayes.notrunc.bnb", "lawless",
                 "bayes.trunc", "unif", "bayes.trunc.ddcp"),
        aggregate.by="1 day",
        D=15,
        m=NULL, m.interpretation=c("hoehle_anderheiden2014", "lawless1994"),
        control=list(
            dRange=NULL, alpha=0.05, nSamples=1e3,
            N.tInf.prior=c("poisgamma","pois","unif"),
            N.tInf.max=300, gd.prior.kappa=0.1,
            ddcp=list(ddChangepoint=NULL,
                      cp_order=c("zero","one"),
                      Wextra=NULL,
                      logLambda=c("iidLogGa","tps","rw1","rw2"),
                      responseDistr=c("poisson", "negbin"),
                      mcmc=c(burnin=2500, sample=10000, thin=1, adapt=1000,
                             store.samples=FALSE)),
            score=FALSE, predPMF=FALSE))

Arguments

now

an object of class Date denoting the day at which to do the nowcast. This corresponds to \(T\) in the notation of Höhle and an der Heiden (2014).

when

a vector of Date objects denoting the day(s) for which the projections are to be done. One needs to ensure that each element in when is smaller or equal to now.

data

A data frame with one row per case – for each case on needs information on the day of the event (e.g. hospitalization) and the day of report of this event.

dEventCol

The name of the column in data which contains the date of the event, e.g. hospitalization. Default: "dHospital".

dReportCol

Name of the column in data containing the date at which the report arrives at the respective register. Default: "dReport".

method

A vector of strings denoting the different methods for doing the nowcasting. Note that results of the first name in this list are officially returned by the function. However, it is possible to specify several methods here, e.g., in order to compare score evaluations. Details of the methods are described in Höhle and an der Heiden (2014).

"unif"
"bayes.notrunc": A Bayesian procedure ignoring truncation.
"bayes.notrunc.bnb": A fast Bayesian procedure ignoring truncation and which calculates the adjustment per-time (i.e. ignoring other delays) using the negative binomial.
"lawless": A discretized version of the Gaussian predictive distribution suggested in Lawless (1994).
"bayes.trunc": Bayesian method based on the generalized Dirichlet distribution, which is the conjugate prior-posterior for the delay distribution PMF under right-truncated sampling as shown in HadH (2014).
"bayes.trunc.ddcp": Fully Bayesian method allowing for change-points in the delay distribution, e.g., due to speed-ups in the reporting process. A discrete-survival model is used for the delay distribution. Details of the methods are described in HadH (2014). Note: This method requires that the JAGS program is installed on the system.

aggregate.by

Time scale used for the temporal aggregation of the records in the data data. See linelist2sts and seq.Date for further information.

D

Maximum possible or maximum relevant delay (unit: aggregate.by). Default: 15.

m

Size of the moving window for the estimation of the delay distribution. Default: NULL, i.e. take all values at all times. Otherwise: a positive integer equal to or greater than D such that only values from a sliding window are used. The shape of the window depends on the value of m.interpretation.

m.interpretation

This parameter controls the interpretation of the sliding window used to estimate the delay distribution. If m.interpretation="hoehle_anderheiden2014" (Default) then the sliding window is defined as a horizontal cut in the reporting triangle, i.e. the values for the delay estimation originate from reports occurring during (now-m):now. This means that the estimation of long delays is based on fewer observations than the estimation of the short delays, hence, the long delay estimates are subject to more variability. If for example \(m=D\) then the estimate for a delay of \(d=D\) is based on only one observation. The advantage of this choice is that one explicitly knows which time period all observations originate from. For details see Section 3 of Höhle and an der Heiden (2014).

Alternatively, when m.interpretation="lawless1994", the cut in the reporting triangle is made such that each delay d is estimated based on the same number of observations (\(m+1\)). This means that in order to estimate the delay for \(d\) days, a sliding rectangle of length \(m+1\) containing the reports which occurred during (now-m-d):now. See Fig. 2 in Lawless (1994) for details. Note: A warning is given is method="lawless", but m.interpretation is not.

control

A list with named arguments controlling the functionality of the nowcasting.

dRange

Default: NULL. In this case the dEventCol column is used to extract the first and last available in data.

alpha

Equal tailed (1-\(\alpha\))*100% prediction intervals are calculated. Default: 0.05.

nSamples

Number of PMF samples in the bayes.* procedures. Note: Entire vectors containing the PMF on the grid from 0 to N.tInf.max are drawn and which are then combined. The argument does not apply to the bayes.trunc.ddcp method.

N.tInf.prior

Prior distribution of \(N(t,\infty)\). Applies only to the bayes.* except bayes.bayes.ddcp methods. See example on how to control the distribution parameters.

N.tInf.max

Limit of the support of \(N(t,\infty)\). The value needs to be high enough such that at this limit only little of the predictive distribution is right-truncated. Default: 300.

gd.prior.kappa

Concentration parameter for the Dirichlet prior for the delay distribution on \(0,...,D\). Default: 0.1. Note: The procedure is quite sensitive to this parameter in case only few cases are available.

ddcp

A list specifying the change point model for the delay distribution. This method should only be used if detailed information about changes in the delay distribution are available as, e.g., in the case of the STEC O104:H4 outbreak. The components are as follows:

ddChangepoint: Vector of Date objects corresponding to the changepoints
cp_order: Either "zero" (Default) or "one". This is the degree of the TPS spline for the baseline hazard, which is formed by the changepoints. Order zero corresponds to the dummy variables of the change-points being simply zero or one. In case a 1st order polynomial is chosen, this allows the delay distribution to change towards faster or slow reporting as time progresses (until the next change-point). The later can be helpful in very dynamic epidemic situations where a lot of cases suddenly appear overwhelming the surveillance system infrastructure.
Wextra: An additional design matrix part to be joined onto the part originating from the change-points. Altogether, the column bind of these two quantities will be \(W_{t,d}\). This allows one to include, e.g., day of the week effects or holidays.
logLambda: Prior on the spline. One of c("iidLogGa","tps","rw1","rw2").
respDistr: Response distribution of \(n_{t,d}\) in the reporting triangle. Default is "poisson". An experimental alternative is to use "negbin".
tau.gamma
eta.mu: Vector of coefficients describing the mean of the prior normal distribution of the regression effects in the discrete time survival model.
eta.prec: A precision matrix for the regression effects in the discrete time survival model.
mcmc: A named vector of length 5 containing burn-in (default: 2500), number of samples (10000), thinning (1) and adaptation (1000) for the three MCMC chains which are ran. The values are passed on to run.jags. The fifth argument store.samples denotes if the output of the JAGS sampling should be included as part of the returned stsNC object. Warning: If TRUE (Default: FALSE) the size of the returned object might increase substantially.

score

Compute scoring rules. Default: FALSE. The computed scores are found in the SR slot of the result.

predPMF

Boolean whether to return the probability mass functions of the individual forecasts (Default: FALSE). The result can be found in the control slot of the return object.

Details

The methodological details of the nowcasting procedures are described in Höhle M and an der Heiden M (2014).

Value

nowcast returns an object of "stsNC". The upperbound slot contains the median of the method specified at the first position the argument method. The slot pi (for prediction interval) contains the equal tailed (1-\(\alpha\))*100% prediction intervals, which are calculated based on the predictive distributions in slot predPMF. Furthermore, slot truth contains an sts object containing the true number of cases (if possible to compute it is based on the data in data). Finally, slot SR contains the results for the proper scoring rules (requires truth to be calculable).

References

Höhle, M. and an der Heiden, M. (2014): Bayesian nowcasting during the STEC O104:H4 outbreak in Germany, 2011. Biometrics 70(4):993-1002. doi:10.1111/biom.12194 .
A preprint is available as https://staff.math.su.se/hoehle/pubs/hoehle_anderheiden2014-preprint.pdf.

Günther, F. and Bender, A. and Katz, K. and Küchenhoff, H. and Höhle, M. (2020): Nowcasting the COVID-19 pandemic in Bavaria. Biometrical Journal. doi:10.1002/bimj.202000112
Preprint available at doi:10.1101/2020.06.26.20140210 .

Author

Michael Höhle

Note

Note: The bayes.trunc.ddcp uses the JAGS software together with the R package runjags to handle the parallelization of the MCMC using the "rjparallel" method of run.jags, which additionally requires the rjags package. You need to manually install JAGS on your computer for the package to work – see https://mcmc-jags.sourceforge.io/ and the documentation of runjags for details.

Note: The function is still under development and might change in the future. Unfortunately, little emphasis has so far been put on making the function easy to understand and use.

Examples

data("husO104Hosp")

#Extract the reporting triangle at a specific day
t.repTriangle <- as.Date("2011-07-04")

#Use 'void' nowcasting procedure (we just want the reporting triangle)
nc <- nowcast(now=t.repTriangle,when=t.repTriangle,
              dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp,
              D=15,method="unif")

#Show reporting triangle
reportingTriangle(nc)

#Perform Bayesian nowcasting assuming the delay distribution is stable over time
nc.control <- list(N.tInf.prior=structure("poisgamma",
                                mean.lambda=50,var.lambda=3000),
                                nSamples=1e2)

t.repTriangle <- as.Date("2011-06-10")
when <- seq(t.repTriangle-3,length.out=10,by="-1 day")
nc <- nowcast(now=t.repTriangle,when=when,
              dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp,
              D=15,method="bayes.trunc",control=nc.control)

#Show time series and posterior median forecast/nowcast
plot(nc,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange),
     xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b",
     xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2))

if (FALSE) { # \dontrun{
### Using runjags to do a Bayesian model with changepoint(s)
### -- this might take a while
nc.control.ddcp <- modifyList(nc.control,
                    list(gd.prior.kappa=0.1,
                         ddcp=list(ddChangepoint=as.Date(c("2011-05-23")),
                             logLambda="tps",
                             tau.gamma=1,
                             mcmc=c(burnin=1000,sample=1000,thin=1,
                                    adapt=1000,store.samples=FALSE))))

nc.ddcp <- nowcast(now=t.repTriangle,when=when,
               dEventCol="dHosp",dReportCol="dReport",
               data=husO104Hosp, aggregate.by="1 day",
               method="bayes.trunc.ddcp", D=15,
                   control=nc.control.ddcp)

plot(nc.ddcp,legend.opts=NULL,
     xaxis.tickFreq=list("%d"=atChange,"%m"=atChange),
     xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b",
     xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2))

lambda <- attr(delayCDF(nc.ddcp)[["bayes.trunc.ddcp"]],"model")$lambda
showIdx <- seq(which( max(when) == epoch(nc.ddcp))) #seq(ncol(lambda))
matlines( showIdx,t(lambda)[showIdx,],col="gray",lwd=c(1,2,1),lty=c(2,1,2))
legend(x="topright",c(expression(lambda(t)),"95% CI"),col="gray",lwd=c(2,1),lty=c(1,2))
} # }