Simulation of Epidemic Data

This function simulates the infection (and removal) times of an epidemic. Besides the classical SIR type of epidemic, also SI, SIRS and SIS epidemics are supported. Simulation works via the conditional intensity of infection of an individual, given some (time varying) endemic covariates and/or some distance functions (epidemic components) as well as the fixed positions of the individuals. The lengths of the infectious and removed periods are generated following a pre-specified function (can be deterministic).

The simulate method for objects of class "twinSIR" simulates new epidemic data using the model and the parameter estimates of the fitted object.

Usage

simEpidata(formula, data, id.col, I0.col, coords.cols, subset,
           beta, h0, f = list(), w = list(), alpha, infPeriod,
           remPeriod = function(ids) rep(Inf, length(ids)),
           end = Inf, trace = FALSE, .allocate = NULL)

# S3 method for class 'twinSIR'
simulate(object, nsim = 1, seed = 1,
         infPeriod = NULL, remPeriod = NULL,
         end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL,
         data = object$data, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the intensity model to be estimated. The details of model specification are given under Details.

data

a data.frame containing the variables in formula and the variables specified by id.col, I0.col and coords.col (see below). It represents the “history” of the endemic covariates to use for the simulation. The form is similar to and can be an object of class "epidata". The simulation period is split up into consecutive intervals of constant endemic covariables. The data frame consists of a block of N (number of individuals) rows for each of those time intervals (all rows in a block share the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the endemic covariates of the individual during the time interval given by the start and stop columns (specified through the lhs of formula).

For the simulate method of class "twinSIR" this should be the object of class "epidata" used for the fit. This is a part of the return value of the function twinSIR, if called with argument keep.data set to TRUE.

id.col

only if data does not inherit from epidata: single index of the id column in data. Can be numeric (by column number) or character (by column name).
The id column identifies the individuals in the data-frame. It will be converted to a factor variable and its levels serve also to identify individuals as argument to the infPeriod function.

I0.col

only if data does not inherit from epidata: single index of the I0 column in data. Can be numeric (by column number), character (by column name) or NULL.
The I0 column indicates if an individual is initially infectious, i.e. it is already infectious at the beginning of the first time block. Setting I0.col = NULL is short for “there are no initially infectious individuals”. Otherwise, the variable must be logical or in 0/1-coding. As this variable is constant over time the initially infectious individuals are derived from the first time block only.

coords.cols

only if data does not inherit from epidata: indexes of the coords columns in data. Can be a numeric (by column number), a character (by column name) vector or NULL.
These columns contain the coordinates of the individuals. It must be emphasized that the functions in this package currently assume fixed positions of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only. The epidemic covariates are calculated based on the Euclidean distance between the individuals, see f.

subset

an optional vector specifying a subset of the covariate history to be used in the simulation.

beta

numeric vector of length equal the number of endemic (cox) terms on the rhs of formula. It contains the effects of the endemic predictor (excluding the log-baseline h0, see below) in the same order as in the formula.

h0

either a single number to specify a constant baseline hazard (equal to exp(h0)) or a list of functions named exact and upper. In the latter case, h0$exact is the true log-baseline hazard function and h0$upper is a piecewise constant upper bound for h0$exact. The function h0$upper must inherit from stepfun with right=FALSE. Theoretically, the intensity function is left-continuous, thus right=TRUE would be adequate, but in the implementation, when we evaluate the intensity at the knots (change points) of h0$upper we need its value for the subsequent interval.

f, w

see as.epidata.

alpha

a named numeric vector of coefficients for the epidemic covariates generated by f and w. The names are matched against names(f) and names(w). Remember that alpha >= 0.

infPeriod

a function generating lengths of infectious periods. It should take one parameter (e.g. ids), which is a character vector of id's of individuals, and return appropriate infection periods for those individuals. Therefore, the value of the function should be of length length(ids). For example, for independent and identically distributed infection periods following $Exp(1)$, the generating function is function(ids) rexp(length(ids), rate=1). For a constant infectious period of length c, it is sufficient to set function (x) {c}.
For the simulate method of class "twinSIR" only, this can also be NULL (the default), which means that the observed infectious periods of infected individuals are re-used when simulating a new epidemic and individuals with missing infectious periods (i.e. infection and recovery was not observed) are attributed to the mean observed infectious period.

Note that it is even possible to simulate an SI-epidemic by setting

infPeriod = function (x) {Inf}

In other words: once an individual became infected it spreads the disease forever, i.e. it will never be removed.

remPeriod

a function generating lengths of removal periods. Per default, once an individual was removed it will stay in this state forever (Inf). Therefore, it will not become at-risk (S) again and re-infections are not possible. Alternatively, always returning 0 as length of the removal period corresponds to a SIS epidemic. Any other values correspond to SIRS. Note that end should be set to a finite value in these cases.

end

a single positive numeric value specifying the time point at which the simulation should be forced to end. By default, this is Inf, i.e. the simulation continues until there is no susceptible individual left.
For the simulate method of class "twinSIR" the default is to have equal simulation and observation periods.

trace

logical (or integer) indicating if (or how often) the sets of susceptible and infected individuals as well as the rejection indicator (of the rejection sampling step) should be cated. Defaults to FALSE.

.allocate

number of blocks to initially allocate for the event history (i.e. .allocate*N rows). By default (NULL), this number is set to max(500, ceiling(nBlocks/100)*100), i.e. 500 but at least the number of blocks in data (rounded to the next multiple of 100). Each time the simulated epidemic exceeds the allocated space, the event history will be enlarged by .allocate blocks.

object

an object of class "twinSIR". This must contain the original data used for the fit (see data).

nsim

number of epidemics to simulate. Defaults to 1.

seed

an integer that will be used in the call to set.seed before simulating the epidemics.

...

unused (argument of the generic).

Details

A model is specified through the formula, which has the form

cbind(start, stop) ~ cox(endemicVar1) * cox(endemicVar2),

i.e. the right hand side has the usual form as in lm, but all variables are marked as being endemic by the special function cox. The effects of those predictor terms are specified by beta. The left hand side of the formula denotes the start and stop columns in data. This can be omitted, if data inherits from class "epidata" in which case cbind(start, stop) will be used. The epidemic model component is specified by the arguments f and w (and the associated coefficients alpha).

If the epidemic model component is empty and infPeriod always returns Inf, then one actually simulates from a pure Cox model.

The simulation algorithm used is Ogata's modified thinning. For details, see Höhle (2009), Section 4.

Value

An object of class "simEpidata", which is a data.frame with the columns "id", "start", "stop", "atRiskY", "event", "Revent" and the coordinate columns (with the original names from data), which are all obligatory. These columns are followed by all the variables appearing on the rhs of the formula. Last but not least, the generated columns with epidemic covariates corresponding to the functions in the lists f and w are appended.

Note that objects of class "simEpidata" also inherit from class "epidata", thus all "epidata" methods can be applied.

The data.frame is given the additional attributes

"eventTimes": numeric vector of infection time points (sorted chronologically).
"timeRange": numeric vector of length 2: c(min(start), max(stop)).
"coords.cols": numeric vector containing the column indices of the coordinate columns in the resulting data-frame.
"f": this equals the argument f.
"w": this equals the argument w.
"config": a list with elements h0 = h0$exact, beta and alpha.
call: the matched call.
terms: the terms object used.

If nsim > 1 epidemics are simulated by the simulate-method for fitted "twinSIR" models, these are returned in a list.

References

Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.

Author

Sebastian Meyer and Michael Höhle

Examples

## Generate a data frame containing a hypothetic population with 100 individuals
set.seed(1234)
n <- 100
pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y")))
pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2], 
                  gender=sample(0:1, n, replace=TRUE),
                  I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious
                  start=rep(0,n), stop=rep(Inf,n))

## Simulate an SIR epidemic in this population
set.seed(123)
infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n)
epi <- simEpidata(
    cbind(start,stop) ~ cox(gender), data = pop,
    id.col = "id", I0.col = "I0col", coords.cols = c("x","y"),
    beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1),
    infPeriod = function(ids) infPeriods[ids],
    ##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30   # -> SIRS
)

## extract event times by id
head(summary(epi)$byID)

## Plot the numbers of susceptible, infectious and removed individuals
plot(epi)


## load the 1861 Hagelloch measles epidemic
data("hagelloch")
summary(hagelloch)
plot(hagelloch)

## fit a simplistic twinSIR model
fit <- twinSIR(~ household, data = hagelloch)

## simulate a new epidemic from the above model
## with simulation period = observation period, re-using observed infPeriods
sim1 <- simulate(fit, data = hagelloch)
plot(sim1)

## check if we find similar parameters in the simulated epidemic
fitsim1 <- update(fit, data = sim1)
cbind(base = coef(fit), new = coef(fitsim1))


if (surveillance.options("allExamples")) {

## simulate only 10 days, using random infPeriods ~ Exp(0.1)
sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10,
    infPeriod = function(ids) rexp(length(ids), rate = 0.1))
plot(sim2)

## simulate from a different model with manually specified parameters
set.seed(321)
simepi <- simEpidata(~ cox(AGE), data = hagelloch,
    beta = c(0.1), h0 = -4, alpha = c(household = 0.05),
    f = list(household = function(u) u == 0),
    infPeriod = function(ids) rexp(length(ids), rate=1/8))
plot(simepi)
intensityplot(simepi)

## see if we correctly estimate the parameters
fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi)
cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi))

}