# Simulation of Epidemic Data

`twinSIR_simulation.Rd`

This function simulates the infection (and removal) times of an epidemic. Besides the classical SIR type of epidemic, also SI, SIRS and SIS epidemics are supported. Simulation works via the conditional intensity of infection of an individual, given some (time varying) endemic covariates and/or some distance functions (epidemic components) as well as the fixed positions of the individuals. The lengths of the infectious and removed periods are generated following a pre-specified function (can be deterministic).

The `simulate`

method for objects of class
`"twinSIR"`

simulates new epidemic data using the model and
the parameter estimates of the fitted object.

## Usage

```
simEpidata(formula, data, id.col, I0.col, coords.cols, subset,
beta, h0, f = list(), w = list(), alpha, infPeriod,
remPeriod = function(ids) rep(Inf, length(ids)),
end = Inf, trace = FALSE, .allocate = NULL)
# S3 method for twinSIR
simulate(object, nsim = 1, seed = 1,
infPeriod = NULL, remPeriod = NULL,
end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL,
data = object$data, ...)
```

## Arguments

- formula
an object of class

`"formula"`

(or one that can be coerced to that class): a symbolic description of the intensity model to be estimated. The details of model specification are given under Details.- data
a data.frame containing the variables in

`formula`

and the variables specified by`id.col`

,`I0.col`

and`coords.col`

(see below). It represents the “history” of the endemic covariates to use for the simulation. The form is similar to and can be an object of class`"epidata"`

. The simulation period is split up into*consecutive*intervals of constant endemic covariables. The data frame consists of a block of N (number of individuals) rows for each of those time intervals (all rows in a block share the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the endemic covariates of the individual during the time interval given by the start and stop columns (specified through the lhs of`formula`

).For the

`simulate`

method of class`"twinSIR"`

this should be the object of class`"epidata"`

used for the fit. This is a part of the return value of the function`twinSIR`

, if called with argument`keep.data`

set to`TRUE`

.- id.col
only if

`data`

does not inherit from`epidata`

: single index of the`id`

column in`data`

. Can be numeric (by column number) or character (by column name).

The`id`

column identifies the individuals in the data-frame. It will be converted to a factor variable and its levels serve also to identify individuals as argument to the`infPeriod`

function.- I0.col
only if

`data`

does not inherit from`epidata`

: single index of the`I0`

column in`data`

. Can be numeric (by column number), character (by column name) or`NULL`

.

The`I0`

column indicates if an individual is initially infectious, i.e. it is already infectious at the beginning of the first time block. Setting`I0.col = NULL`

is short for “there are no initially infectious individuals”. Otherwise, the variable must be logical or in 0/1-coding. As this variable is constant over time the initially infectious individuals are derived from the first time block only.- coords.cols
only if

`data`

does not inherit from`epidata`

: index*es*of the`coords`

column*s*in`data`

. Can be a numeric (by column number), a character (by column name) vector or`NULL`

.

These columns contain the coordinates of the individuals. It must be emphasized that the functions in this package currently assume*fixed positions*of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only. The epidemic covariates are calculated based on the Euclidian distance between the individuals, see`f`

.- subset
an optional vector specifying a subset of the covariate history to be used in the simulation.

- beta
numeric vector of length equal the number of endemic (

`cox`

) terms on the rhs of`formula`

. It contains the effects of the endemic predictor (excluding the log-baseline`h0`

, see below) in the same order as in the formula.- h0
*either*a single number to specify a constant baseline hazard (equal to`exp(h0)`

)*or*a list of functions named`exact`

and`upper`

. In the latter case,`h0$exact`

is the true log-baseline hazard function and`h0$upper`

is a*piecewise constant upper bound*for`h0$exact`

. The function`h0$upper`

must inherit from`stepfun`

with`right=FALSE`

. Theoretically, the intensity function is left-continuous, thus`right=TRUE`

would be adequate, but in the implementation, when we evaluate the intensity at the`knots`

(change points) of`h0$upper`

we need its value for the subsequent interval.- f, w
see

`as.epidata`

.- alpha
a named numeric vector of coefficients for the epidemic covariates generated by

`f`

and`w`

. The names are matched against`names(f)`

and`names(w)`

. Remember that`alpha >= 0`

.- infPeriod
a function generating lengths of infectious periods. It should take one parameter (e.g.

`ids`

), which is a character vector of id's of individuals, and return appropriate infection periods for those individuals. Therefore, the value of the function should be of length`length(ids)`

. For example, for independent and identically distributed infection periods following \(Exp(1)\), the generating function is`function(ids) rexp(length(ids), rate=1)`

. For a constant infectious period of length c, it is sufficient to set`function (x) {c}`

.

For the`simulate`

method of class`"twinSIR"`

only, this can also be`NULL`

(the default), which means that the observed infectious periods of infected individuals are re-used when simulating a new epidemic and individuals with missing infectious periods (i.e. infection and recovery was not observed) are attributed to the mean observed infectious period.Note that it is even possible to simulate an SI-epidemic by setting

`infPeriod = function (x) {Inf}`

In other words: once an individual became infected it spreads the disease forever, i.e. it will never be removed.

- remPeriod
a function generating lengths of removal periods. Per default, once an individual was removed it will stay in this state forever (

`Inf`

). Therefore, it will not become at-risk (S) again and re-infections are not possible. Alternatively, always returning 0 as length of the removal period corresponds to a SIS epidemic. Any other values correspond to SIRS. Note that`end`

should be set to a finite value in these cases.- end
a single positive numeric value specifying the time point at which the simulation should be forced to end. By default, this is

`Inf`

, i.e. the simulation continues until there is no susceptible individual left.

For the`simulate`

method of class`"twinSIR"`

the default is to have equal simulation and observation periods.- trace
logical (or integer) indicating if (or how often) the sets of susceptible and infected individuals as well as the rejection indicator (of the rejection sampling step) should be

`cat`

ed. Defaults to`FALSE`

.- .allocate
number of blocks to initially allocate for the event history (i.e.

`.allocate*N`

rows). By default (`NULL`

), this number is set to`max(500, ceiling(nBlocks/100)*100)`

, i.e. 500 but at least the number of blocks in`data`

(rounded to the next multiple of 100). Each time the simulated epidemic exceeds the allocated space, the event history will be enlarged by`.allocate`

blocks.- object
an object of class

`"twinSIR"`

. This must contain the original`data`

used for the fit (see`data`

).- nsim
number of epidemics to simulate. Defaults to 1.

- seed
an integer that will be used in the call to

`set.seed`

before simulating the epidemics.- ...
unused (argument of the generic).

## Details

A model is specified through the `formula`

, which has the form

`cbind(start, stop) ~ cox(endemicVar1) * cox(endemicVar2)`

,

i.e. the right hand side has the usual form as in `lm`

, but
all variables are marked as being endemic by the special function
`cox`

. The effects of those predictor terms are specified by
`beta`

. The left hand side of the formula denotes the start
and stop columns in `data`

. This can be omitted, if `data`

inherits
from class `"epidata"`

in which case `cbind(start, stop)`

will be
used. The epidemic model component is specified by the arguments
`f`

and `w`

(and the associated coefficients `alpha`

).

If the epidemic model component is empty and `infPeriod`

always returns `Inf`

, then one actually simulates from a pure Cox model.

The simulation algorithm used is *Ogata's modified thinning*.
For details, see Höhle (2009), Section 4.

## Value

An object of class `"simEpidata"`

, which is a `data.frame`

with the
columns `"id"`

, `"start"`

, `"stop"`

, `"atRiskY"`

,

`"event"`

, `"Revent"`

and the coordinate columns (with the original
names from `data`

), which are all obligatory. These columns are followed
by all the variables appearing on the rhs of the `formula`

. Last but not
least, the generated columns with epidemic covariates corresponding to the
functions in the lists `f`

and `w`

are appended.

Note that objects of class `"simEpidata"`

also inherit from class

`"epidata"`

, thus all `"epidata"`

methods can be
applied.

The `data.frame`

is given the additional *attributes*

- "eventTimes"
numeric vector of infection time points (sorted chronologically).

- "timeRange"
numeric vector of length 2:

`c(min(start), max(stop))`

.- "coords.cols"
numeric vector containing the column indices of the coordinate columns in the resulting data-frame.

- "f"
this equals the argument

`f`

.- "w"
this equals the argument

`w`

.- "config"
a list with elements

`h0 = h0$exact`

,`beta`

and`alpha`

.- call
the matched call.

- terms
the

`terms`

object used.

If `nsim > 1`

epidemics are simulated by the

`simulate`

-method for fitted `"twinSIR"`

models, these are
returned in a list.

## References

Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.

## See also

The `plot.epidata`

and `animate.epidata`

methods
for plotting and animating (simulated) epidemic data, respectively.
The `intensityplot.simEpidata`

method for plotting paths of
infection intensities.

Function `twinSIR`

for fitting spatio-temporal epidemic intensity
models to epidemic data.

## Examples

```
## Generate a data frame containing a hypothetic population with 100 individuals
set.seed(1234)
n <- 100
pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y")))
pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2],
gender=sample(0:1, n, replace=TRUE),
I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious
start=rep(0,n), stop=rep(Inf,n))
## Simulate an SIR epidemic in this population
set.seed(123)
infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n)
epi <- simEpidata(
cbind(start,stop) ~ cox(gender), data = pop,
id.col = "id", I0.col = "I0col", coords.cols = c("x","y"),
beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1),
infPeriod = function(ids) infPeriods[ids],
##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30 # -> SIRS
)
## extract event times by id
head(summary(epi)$byID)
## Plot the numbers of susceptible, infectious and removed individuals
plot(epi)
## load the 1861 Hagelloch measles epidemic
data("hagelloch")
summary(hagelloch)
plot(hagelloch)
## fit a simplistic twinSIR model
fit <- twinSIR(~ household, data = hagelloch)
## simulate a new epidemic from the above model
## with simulation period = observation period, re-using observed infPeriods
sim1 <- simulate(fit, data = hagelloch)
plot(sim1)
## check if we find similar parameters in the simulated epidemic
fitsim1 <- update(fit, data = sim1)
cbind(base = coef(fit), new = coef(fitsim1))
if (surveillance.options("allExamples")) {
## simulate only 10 days, using random infPeriods ~ Exp(0.1)
sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10,
infPeriod = function(ids) rexp(length(ids), rate = 0.1))
plot(sim2)
## simulate from a different model with manually specified parameters
set.seed(321)
simepi <- simEpidata(~ cox(AGE), data = hagelloch,
beta = c(0.1), h0 = -4, alpha = c(household = 0.05),
f = list(household = function(u) u == 0),
infPeriod = function(ids) rexp(length(ids), rate=1/8))
plot(simepi)
intensityplot(simepi)
## see if we correctly estimate the parameters
fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi)
cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi))
}
```