# Adjust a univariate time series of counts for observed but-not-yet-reported events

`nowcast.Rd`

Nowcasting can help to obtain up-to-date information on trends during a situation where reports about events arrive with delay. For example in public health reporting, reports about important indicators (such as occurrence of cases) are prone to be delayed due to for example manual quality checking and reporting system hierarchies. Altogether, the delays are subject to a delay distribution, which may, or may not, vary over time.

## Usage

```
nowcast(now, when, data, dEventCol="dHospital", dReportCol="dReport",
method=c("bayes.notrunc", "bayes.notrunc.bnb", "lawless",
"bayes.trunc", "unif", "bayes.trunc.ddcp"),
aggregate.by="1 day",
D=15,
m=NULL, m.interpretation=c("hoehle_anderheiden2014", "lawless1994"),
control=list(
dRange=NULL, alpha=0.05, nSamples=1e3,
N.tInf.prior=c("poisgamma","pois","unif"),
N.tInf.max=300, gd.prior.kappa=0.1,
ddcp=list(ddChangepoint=NULL,
cp_order=c("zero","one"),
Wextra=NULL,
logLambda=c("iidLogGa","tps","rw1","rw2"),
responseDistr=c("poisson", "negbin"),
mcmc=c(burnin=2500, sample=10000, thin=1, adapt=1000,
store.samples=FALSE)),
score=FALSE, predPMF=FALSE))
```

## Arguments

- now
an object of class

`Date`

denoting the day at which to do the nowcast. This corresponds to \(T\) in the notation of Höhle and an der Heiden (2014).- when
a vector of

`Date`

objects denoting the day(s) for which the projections are to be done. One needs to ensure that each element in`when`

is smaller or equal to`now`

.- data
A data frame with one row per case – for each case on needs information on the day of the event (e.g. hospitalization) and the day of report of this event.

- dEventCol
The name of the column in

`data`

which contains the date of the event, e.g. hospitalization. Default:`"dHospital"`

.- dReportCol
Name of the column in

`data`

containing the date at which the report arrives at the respective register. Default:`"dReport"`

.- method
A vector of strings denoting the different methods for doing the nowcasting. Note that results of the first name in this list are officially returned by the function. However, it is possible to specify several methods here, e.g., in order to compare score evaluations. Details of the methods are described in Höhle and an der Heiden (2014).

`"unif"`

`"bayes.notrunc"`

A Bayesian procedure ignoring truncation.

`"bayes.notrunc.bnb"`

A fast Bayesian procedure ignoring truncation and which calculates the adjustment per-time (i.e. ignoring other delays) using the negative binomial.

`"lawless"`

A discretized version of the Gaussian predictive distribution suggested in Lawless (1994).

`"bayes.trunc"`

Bayesian method based on the generalized Dirichlet distribution, which is the conjugate prior-posterior for the delay distribution PMF under right-truncated sampling as shown in HadH (2014).

`"bayes.trunc.ddcp"`

Fully Bayesian method allowing for change-points in the delay distribution, e.g., due to speed-ups in the reporting process. A discrete-survival model is used for the delay distribution. Details of the methods are described in HadH (2014). Note: This method requires that the JAGS program is installed on the system.

- aggregate.by
Time scale used for the temporal aggregation of the records in the data

`data`

. See`linelist2sts`

and`seq.Date`

for further information.- D
Maximum possible or maximum relevant delay (unit:

`aggregate.by`

). Default: 15.- m
Size of the moving window for the estimation of the delay distribution. Default:

`NULL`

, i.e. take all values at all times. Otherwise: a positive integer equal to or greater than`D`

such that only values from a sliding window are used. The shape of the window depends on the value of`m.interpretation`

.- m.interpretation
This parameter controls the interpretation of the sliding window used to estimate the delay distribution. If

`m.interpretation="hoehle_anderheiden2014"`

(Default) then the sliding window is defined as a horizontal cut in the reporting triangle, i.e. the values for the delay estimation originate from reports occurring during`(now-m):now`

. This means that the estimation of long delays is based on fewer observations than the estimation of the short delays, hence, the long delay estimates are subject to more variability. If for example \(m=D\) then the estimate for a delay of \(d=D\) is based on only one observation. The advantage of this choice is that one explicitly knows which time period all observations originate from. For details see Section 3 of Höhle and an der Heiden (2014).Alternatively, when

`m.interpretation`

="lawless1994", the cut in the reporting triangle is made such that each delay`d`

is estimated based on the same number of observations (\(m+1\)). This means that in order to estimate the delay for \(d\) days, a sliding rectangle of length \(m+1\) containing the reports which occurred during`(now-m-d):now`

. See Fig. 2 in Lawless (1994) for details. Note: A warning is given is`method="lawless"`

, but`m.interpretation`

is not.- control
A list with named arguments controlling the functionality of the nowcasting.

- dRange
Default:

`NULL`

. In this case the`dEventCol`

column is used to extract the first and last available in`data`

.- alpha
Equal tailed (1-\(\alpha\))*100% prediction intervals are calculated. Default: 0.05.

- nSamples
Number of PMF samples in the

`bayes.*`

procedures. Note: Entire vectors containing the PMF on the grid from 0 to`N.tInf.max`

are drawn and which are then combined. The argument does not apply to the`bayes.trunc.ddcp`

method.- N.tInf.prior
Prior distribution of \(N(t,\infty)\). Applies only to the

`bayes.*`

except`bayes.bayes.ddcp`

methods. See example on how to control the distribution parameters.- N.tInf.max
Limit of the support of \(N(t,\infty)\). The value needs to be high enough such that at this limit only little of the predictive distribution is right-truncated. Default: 300.

- gd.prior.kappa
Concentration parameter for the Dirichlet prior for the delay distribution on \(0,...,D\). Default: 0.1. Note: The procedure is quite sensitive to this parameter in case only few cases are available.

- ddcp
A list specifying the change point model for the delay distribution. This method should only be used if detailed information about changes in the delay distribution are available as, e.g., in the case of the STEC O104:H4 outbreak. The components are as follows:

`ddChangepoint`

Vector of Date objects corresponding to the changepoints

`cp_order`

Either

`"zero"`

(Default) or`"one"`

. This is the degree of the TPS spline for the baseline hazard, which is formed by the changepoints. Order zero corresponds to the dummy variables of the change-points being simply zero or one. In case a 1st order polynomial is chosen, this allows the delay distribution to change towards faster or slow reporting as time progresses (until the next change-point). The later can be helpful in very dynamic epidemic situations where a lot of cases suddenly appear overwhelming the surveillance system infrastructure.`Wextra`

An additional design matrix part to be joined onto the part originating from the change-points. Altogether, the column bind of these two quantities will be \(W_{t,d}\). This allows one to include, e.g., day of the week effects or holidays.

`logLambda`

Prior on the spline. One of

`c("iidLogGa","tps","rw1","rw2")`

.`respDistr`

Response distribution of \(n_{t,d}\) in the reporting triangle. Default is

`"poisson"`

. An experimental alternative is to use`"negbin"`

.`tau.gamma`

`eta.mu`

Vector of coefficients describing the mean of the prior normal distribution of the regression effects in the discrete time survival model.

`eta.prec`

A precision matrix for the regression effects in the discrete time survival model.

`mcmc`

A named vector of length 5 containing burn-in (default: 2500), number of samples (10000), thinning (1) and adaptation (1000) for the three MCMC chains which are ran. The values are passed on to

`run.jags`

. The fifth argument`store.samples`

denotes if the output of the JAGS sampling should be included as part of the returned`stsNC`

object. Warning: If`TRUE`

(Default:`FALSE`

) the size of the returned object might increase substantially.

- score
Compute scoring rules. Default:

`FALSE`

. The computed scores are found in the`SR`

slot of the result.- predPMF
Boolean whether to return the probability mass functions of the individual forecasts (Default:

`FALSE`

). The result can be found in the`control`

slot of the return object.

## Details

The methodological details of the nowcasting procedures are described in Höhle M and an der Heiden M (2014).

## Value

`nowcast`

returns an object of `"stsNC"`

. The
`upperbound`

slot contains the median of the method specified at
the first position the argument `method`

. The slot `pi`

(for
prediction interval)
contains the equal tailed (1-\(\alpha\))*100% prediction
intervals, which are calculated based on the predictive distributions
in slot `predPMF`

.
Furthermore, slot `truth`

contains an `sts`

object
containing the true number of cases (if possible to compute it is based on
the data in `data`

). Finally, slot `SR`

contains the results
for the proper scoring rules (requires truth to be calculable).

## References

Höhle, M. and an der Heiden, M. (2014): Bayesian nowcasting
during the STEC O104:H4 outbreak in Germany, 2011. *Biometrics*
70(4):993-1002. doi:10.1111/biom.12194
.

A preprint is available as
https://staff.math.su.se/hoehle/pubs/hoehle_anderheiden2014-preprint.pdf.

Günther, F. and Bender, A. and Katz, K. and
Küchenhoff, H. and Höhle, M. (2020):
Nowcasting the COVID-19 pandemic in Bavaria.
*Biometrical Journal*. doi:10.1002/bimj.202000112

Preprint available at doi:10.1101/2020.06.26.20140210
.

## Note

Note: The `bayes.trunc.ddcp`

uses the JAGS software together with
the R package runjags to handle the parallelization of
the MCMC using the `"rjparallel"`

method of
`run.jags`

, which additionally requires the
rjags package. You need to manually install
JAGS on your computer for the package to work – see
https://mcmc-jags.sourceforge.io/
and the documentation of runjags for details.

Note: The function is still under development and might change in the future. Unfortunately, little emphasis has so far been put on making the function easy to understand and use.

## Examples

```
data("husO104Hosp")
#Extract the reporting triangle at a specific day
t.repTriangle <- as.Date("2011-07-04")
#Use 'void' nowcasting procedure (we just want the reporting triangle)
nc <- nowcast(now=t.repTriangle,when=t.repTriangle,
dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp,
D=15,method="unif")
#Show reporting triangle
reportingTriangle(nc)
#Perform Bayesian nowcasting assuming the delay distribution is stable over time
nc.control <- list(N.tInf.prior=structure("poisgamma",
mean.lambda=50,var.lambda=3000),
nSamples=1e2)
t.repTriangle <- as.Date("2011-06-10")
when <- seq(t.repTriangle-3,length.out=10,by="-1 day")
nc <- nowcast(now=t.repTriangle,when=when,
dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp,
D=15,method="bayes.trunc",control=nc.control)
#Show time series and posterior median forecast/nowcast
plot(nc,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange),
xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b",
xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2))
if (FALSE) { # \dontrun{
### Using runjags to do a Bayesian model with changepoint(s)
### -- this might take a while
nc.control.ddcp <- modifyList(nc.control,
list(gd.prior.kappa=0.1,
ddcp=list(ddChangepoint=as.Date(c("2011-05-23")),
logLambda="tps",
tau.gamma=1,
mcmc=c(burnin=1000,sample=1000,thin=1,
adapt=1000,store.samples=FALSE))))
nc.ddcp <- nowcast(now=t.repTriangle,when=when,
dEventCol="dHosp",dReportCol="dReport",
data=husO104Hosp, aggregate.by="1 day",
method="bayes.trunc.ddcp", D=15,
control=nc.control.ddcp)
plot(nc.ddcp,legend.opts=NULL,
xaxis.tickFreq=list("%d"=atChange,"%m"=atChange),
xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b",
xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2))
lambda <- attr(delayCDF(nc.ddcp)[["bayes.trunc.ddcp"]],"model")$lambda
showIdx <- seq(which( max(when) == epoch(nc.ddcp))) #seq(ncol(lambda))
matlines( showIdx,t(lambda)[showIdx,],col="gray",lwd=c(1,2,1),lty=c(2,1,2))
legend(x="topright",c(expression(lambda(t)),"95% CI"),col="gray",lwd=c(2,1),lty=c(1,2))
} # }
```