Simulation of Epidemic Data
twinSIR_simulation.Rd
This function simulates the infection (and removal) times of an epidemic. Besides the classical SIR type of epidemic, also SI, SIRS and SIS epidemics are supported. Simulation works via the conditional intensity of infection of an individual, given some (time varying) endemic covariates and/or some distance functions (epidemic components) as well as the fixed positions of the individuals. The lengths of the infectious and removed periods are generated following a pre-specified function (can be deterministic).
The simulate
method for objects of class
"twinSIR"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
Usage
simEpidata(formula, data, id.col, I0.col, coords.cols, subset,
beta, h0, f = list(), w = list(), alpha, infPeriod,
remPeriod = function(ids) rep(Inf, length(ids)),
end = Inf, trace = FALSE, .allocate = NULL)
# S3 method for class 'twinSIR'
simulate(object, nsim = 1, seed = 1,
infPeriod = NULL, remPeriod = NULL,
end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL,
data = object$data, ...)
Arguments
- formula
an object of class
"formula"
(or one that can be coerced to that class): a symbolic description of the intensity model to be estimated. The details of model specification are given under Details.- data
a data.frame containing the variables in
formula
and the variables specified byid.col
,I0.col
andcoords.col
(see below). It represents the “history” of the endemic covariates to use for the simulation. The form is similar to and can be an object of class"epidata"
. The simulation period is split up into consecutive intervals of constant endemic covariables. The data frame consists of a block of N (number of individuals) rows for each of those time intervals (all rows in a block share the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the endemic covariates of the individual during the time interval given by the start and stop columns (specified through the lhs offormula
).For the
simulate
method of class"twinSIR"
this should be the object of class"epidata"
used for the fit. This is a part of the return value of the functiontwinSIR
, if called with argumentkeep.data
set toTRUE
.- id.col
only if
data
does not inherit fromepidata
: single index of theid
column indata
. Can be numeric (by column number) or character (by column name).
Theid
column identifies the individuals in the data-frame. It will be converted to a factor variable and its levels serve also to identify individuals as argument to theinfPeriod
function.- I0.col
only if
data
does not inherit fromepidata
: single index of theI0
column indata
. Can be numeric (by column number), character (by column name) orNULL
.
TheI0
column indicates if an individual is initially infectious, i.e. it is already infectious at the beginning of the first time block. SettingI0.col = NULL
is short for “there are no initially infectious individuals”. Otherwise, the variable must be logical or in 0/1-coding. As this variable is constant over time the initially infectious individuals are derived from the first time block only.- coords.cols
only if
data
does not inherit fromepidata
: indexes of thecoords
columns indata
. Can be a numeric (by column number), a character (by column name) vector orNULL
.
These columns contain the coordinates of the individuals. It must be emphasized that the functions in this package currently assume fixed positions of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only. The epidemic covariates are calculated based on the Euclidean distance between the individuals, seef
.- subset
an optional vector specifying a subset of the covariate history to be used in the simulation.
- beta
numeric vector of length equal the number of endemic (
cox
) terms on the rhs offormula
. It contains the effects of the endemic predictor (excluding the log-baselineh0
, see below) in the same order as in the formula.- h0
either a single number to specify a constant baseline hazard (equal to
exp(h0)
) or a list of functions namedexact
andupper
. In the latter case,h0$exact
is the true log-baseline hazard function andh0$upper
is a piecewise constant upper bound forh0$exact
. The functionh0$upper
must inherit fromstepfun
withright=FALSE
. Theoretically, the intensity function is left-continuous, thusright=TRUE
would be adequate, but in the implementation, when we evaluate the intensity at theknots
(change points) ofh0$upper
we need its value for the subsequent interval.- f, w
see
as.epidata
.- alpha
a named numeric vector of coefficients for the epidemic covariates generated by
f
andw
. The names are matched againstnames(f)
andnames(w)
. Remember thatalpha >= 0
.- infPeriod
a function generating lengths of infectious periods. It should take one parameter (e.g.
ids
), which is a character vector of id's of individuals, and return appropriate infection periods for those individuals. Therefore, the value of the function should be of lengthlength(ids)
. For example, for independent and identically distributed infection periods following \(Exp(1)\), the generating function isfunction(ids) rexp(length(ids), rate=1)
. For a constant infectious period of length c, it is sufficient to setfunction (x) {c}
.
For thesimulate
method of class"twinSIR"
only, this can also beNULL
(the default), which means that the observed infectious periods of infected individuals are re-used when simulating a new epidemic and individuals with missing infectious periods (i.e. infection and recovery was not observed) are attributed to the mean observed infectious period.Note that it is even possible to simulate an SI-epidemic by setting
infPeriod = function (x) {Inf}
In other words: once an individual became infected it spreads the disease forever, i.e. it will never be removed.
- remPeriod
a function generating lengths of removal periods. Per default, once an individual was removed it will stay in this state forever (
Inf
). Therefore, it will not become at-risk (S) again and re-infections are not possible. Alternatively, always returning 0 as length of the removal period corresponds to a SIS epidemic. Any other values correspond to SIRS. Note thatend
should be set to a finite value in these cases.- end
a single positive numeric value specifying the time point at which the simulation should be forced to end. By default, this is
Inf
, i.e. the simulation continues until there is no susceptible individual left.
For thesimulate
method of class"twinSIR"
the default is to have equal simulation and observation periods.- trace
logical (or integer) indicating if (or how often) the sets of susceptible and infected individuals as well as the rejection indicator (of the rejection sampling step) should be
cat
ed. Defaults toFALSE
.- .allocate
number of blocks to initially allocate for the event history (i.e.
.allocate*N
rows). By default (NULL
), this number is set tomax(500, ceiling(nBlocks/100)*100)
, i.e. 500 but at least the number of blocks indata
(rounded to the next multiple of 100). Each time the simulated epidemic exceeds the allocated space, the event history will be enlarged by.allocate
blocks.- object
an object of class
"twinSIR"
. This must contain the originaldata
used for the fit (seedata
).- nsim
number of epidemics to simulate. Defaults to 1.
- seed
an integer that will be used in the call to
set.seed
before simulating the epidemics.- ...
unused (argument of the generic).
Details
A model is specified through the formula
, which has the form
cbind(start, stop) ~ cox(endemicVar1) * cox(endemicVar2)
,
i.e. the right hand side has the usual form as in lm
, but
all variables are marked as being endemic by the special function
cox
. The effects of those predictor terms are specified by
beta
. The left hand side of the formula denotes the start
and stop columns in data
. This can be omitted, if data
inherits
from class "epidata"
in which case cbind(start, stop)
will be
used. The epidemic model component is specified by the arguments
f
and w
(and the associated coefficients alpha
).
If the epidemic model component is empty and infPeriod
always returns Inf
, then one actually simulates from a pure Cox model.
The simulation algorithm used is Ogata's modified thinning. For details, see Höhle (2009), Section 4.
Value
An object of class "simEpidata"
, which is a data.frame
with the
columns "id"
, "start"
, "stop"
, "atRiskY"
,
"event"
, "Revent"
and the coordinate columns (with the original
names from data
), which are all obligatory. These columns are followed
by all the variables appearing on the rhs of the formula
. Last but not
least, the generated columns with epidemic covariates corresponding to the
functions in the lists f
and w
are appended.
Note that objects of class "simEpidata"
also inherit from class
"epidata"
, thus all "epidata"
methods can be
applied.
The data.frame
is given the additional attributes
- "eventTimes"
numeric vector of infection time points (sorted chronologically).
- "timeRange"
numeric vector of length 2:
c(min(start), max(stop))
.- "coords.cols"
numeric vector containing the column indices of the coordinate columns in the resulting data-frame.
- "f"
this equals the argument
f
.- "w"
this equals the argument
w
.- "config"
a list with elements
h0 = h0$exact
,beta
andalpha
.- call
the matched call.
- terms
the
terms
object used.
If nsim > 1
epidemics are simulated by the
simulate
-method for fitted "twinSIR"
models, these are
returned in a list.
References
Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.
See also
The plot.epidata
and animate.epidata
methods
for plotting and animating (simulated) epidemic data, respectively.
The intensityplot.simEpidata
method for plotting paths of
infection intensities.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
Examples
## Generate a data frame containing a hypothetic population with 100 individuals
set.seed(1234)
n <- 100
pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y")))
pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2],
gender=sample(0:1, n, replace=TRUE),
I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious
start=rep(0,n), stop=rep(Inf,n))
## Simulate an SIR epidemic in this population
set.seed(123)
infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n)
epi <- simEpidata(
cbind(start,stop) ~ cox(gender), data = pop,
id.col = "id", I0.col = "I0col", coords.cols = c("x","y"),
beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1),
infPeriod = function(ids) infPeriods[ids],
##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30 # -> SIRS
)
## extract event times by id
head(summary(epi)$byID)
## Plot the numbers of susceptible, infectious and removed individuals
plot(epi)
## load the 1861 Hagelloch measles epidemic
data("hagelloch")
summary(hagelloch)
plot(hagelloch)
## fit a simplistic twinSIR model
fit <- twinSIR(~ household, data = hagelloch)
## simulate a new epidemic from the above model
## with simulation period = observation period, re-using observed infPeriods
sim1 <- simulate(fit, data = hagelloch)
plot(sim1)
## check if we find similar parameters in the simulated epidemic
fitsim1 <- update(fit, data = sim1)
cbind(base = coef(fit), new = coef(fitsim1))
if (surveillance.options("allExamples")) {
## simulate only 10 days, using random infPeriods ~ Exp(0.1)
sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10,
infPeriod = function(ids) rexp(length(ids), rate = 0.1))
plot(sim2)
## simulate from a different model with manually specified parameters
set.seed(321)
simepi <- simEpidata(~ cox(AGE), data = hagelloch,
beta = c(0.1), h0 = -4, alpha = c(household = 0.05),
f = list(household = function(u) u == 0),
infPeriod = function(ids) rexp(length(ids), rate=1/8))
plot(simepi)
intensityplot(simepi)
## see if we correctly estimate the parameters
fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi)
cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi))
}