twinSIR_simulation.Rd
This function simulates the infection (and removal) times of an epidemic. Besides the classical SIR type of epidemic, also SI, SIRS and SIS epidemics are supported. Simulation works via the conditional intensity of infection of an individual, given some (time varying) endemic covariates and/or some distance functions (epidemic components) as well as the fixed positions of the individuals. The lengths of the infectious and removed periods are generated following a pre-specified function (can be deterministic).
The simulate
method for objects of class
"twinSIR"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
simEpidata(formula, data, id.col, I0.col, coords.cols, subset,
beta, h0, f = list(), w = list(), alpha, infPeriod,
remPeriod = function(ids) rep(Inf, length(ids)),
end = Inf, trace = FALSE, .allocate = NULL)
# S3 method for twinSIR
simulate(object, nsim = 1, seed = 1,
infPeriod = NULL, remPeriod = NULL,
end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL,
data = object$data, ...)
an object of class "formula"
(or one that can be coerced to
that class): a symbolic description of the intensity model to be estimated.
The details of model specification are given under Details.
a data.frame containing the variables in formula
and the variables
specified by id.col
, I0.col
and coords.col
(see below).
It represents the “history” of the endemic covariates to use for the
simulation. The form is similar to and can be an object of class
"epidata"
. The simulation period is split up into
consecutive intervals of constant endemic covariables. The
data frame consists of a block of N (number of individuals) rows for each of
those time intervals (all rows in a block share the same start and stop
values... therefore the name “block”), where there is one row per
individual in the block. Each row describes the (fixed) state of the
endemic covariates of the individual during the time interval given by the
start and stop columns (specified through the lhs of formula
).
For the simulate
method of class "twinSIR"
this should be
the object of class "epidata"
used for the fit. This is a
part of the return value of the function twinSIR
, if called with
argument keep.data
set to TRUE
.
only if data
does not inherit from epidata
:
single index of the id
column in data
. Can be numeric
(by column number) or character (by column name).
The id
column identifies the individuals in the data-frame. It will
be converted to a factor variable and its levels serve also to identify
individuals as argument to the infPeriod
function.
only if data
does not inherit from epidata
:
single index of the I0
column in data
. Can be numeric
(by column number), character (by column name) or NULL
.
The I0
column indicates if an individual is initially infectious,
i.e. it is already infectious at the beginning of the first time block.
Setting I0.col = NULL
is short for “there are no initially
infectious individuals”. Otherwise, the variable must be logical or in
0/1-coding. As this variable is constant over time the initially
infectious individuals are derived from the first time block only.
only if data
does not inherit from epidata
:
indexes of the coords
columns in data
. Can be a
numeric (by column number), a character (by column name) vector or
NULL
.
These columns contain the coordinates of the individuals. It must be
emphasized that the functions in this package currently assume fixed
positions of the individuals during the whole epidemic. Thus, an
individual has the same coordinates in every block. For simplicity, the
coordinates are derived from the first time block only. The epidemic
covariates are calculated based on the Euclidian distance between the
individuals, see f
.
an optional vector specifying a subset of the covariate history to be used in the simulation.
numeric vector of length equal the number of endemic (cox
) terms on
the rhs of formula
. It contains the effects of the endemic predictor
(excluding the log-baseline h0
, see below) in the same order as in
the formula.
either a single number to specify a constant baseline hazard
(equal to exp(h0)
) or a list of functions named
exact
and upper
. In the latter case, h0$exact
is the true log-baseline hazard function and h0$upper
is a
piecewise constant upper bound for h0$exact
. The function
h0$upper
must inherit from stepfun
with
right=FALSE
. Theoretically, the intensity function is
left-continuous, thus right=TRUE
would be adequate, but in the
implementation, when we evaluate the intensity at the
knots
(change points) of h0$upper
we need its value
for the subsequent interval.
see as.epidata
.
a named numeric vector of coefficients for the epidemic
covariates generated by f
and w
. The names are matched
against names(f)
and names(w)
.
Remember that alpha >= 0
.
a function generating lengths of infectious periods. It should take one
parameter (e.g. ids
), which is a character vector of id's of
individuals, and return appropriate infection periods for those
individuals. Therefore, the value of the function should be of length
length(ids)
. For example, for independent and identically
distributed infection periods following \(Exp(1)\),
the generating function is function(ids) rexp(length(ids), rate=1)
.
For a constant infectious period of length c, it is sufficient to set
function (x) {c}
.
For the simulate
method of class "twinSIR"
only, this can
also be NULL
(the default), which means that the observed infectious
periods of infected individuals are re-used when simulating a new epidemic
and individuals with missing infectious periods (i.e. infection and
recovery was not observed) are attributed to the mean observed
infectious period.
Note that it is even possible to simulate an SI-epidemic by setting
infPeriod = function (x) {Inf}
In other words: once an individual became infected it spreads the disease forever, i.e. it will never be removed.
a function generating lengths of removal periods. Per default, once an
individual was removed it will stay in this state forever (Inf
).
Therefore, it will not become at-risk (S) again and re-infections are not
possible. Alternatively, always returning 0 as length of the removal
period corresponds to a SIS epidemic. Any other values correspond to SIRS.
Note that end
should be set to a finite value in these cases.
a single positive numeric value specifying the time point at which the
simulation should be forced to end. By default, this is Inf
, i.e.
the simulation continues until there is no susceptible individual left.
For the simulate
method of class "twinSIR"
the default is
to have equal simulation and observation periods.
logical (or integer) indicating if (or how often) the sets of susceptible
and infected individuals as well as the rejection indicator (of the
rejection sampling step) should be cat
ed. Defaults to FALSE
.
number of blocks to initially allocate for the event history (i.e.
.allocate*N
rows). By default (NULL
), this number is set to
max(500, ceiling(nBlocks/100)*100)
, i.e. 500 but at least the
number of blocks in data
(rounded to the next multiple of 100).
Each time the simulated epidemic exceeds the allocated space, the
event history will be enlarged by .allocate
blocks.
an object of class "twinSIR"
. This must contain the original
data
used for the fit (see data
).
number of epidemics to simulate. Defaults to 1.
an integer that will be used in the call to set.seed
before
simulating the epidemics.
unused (argument of the generic).
A model is specified through the formula
, which has the form
cbind(start, stop) ~ cox(endemicVar1) * cox(endemicVar2)
,
i.e. the right hand side has the usual form as in lm
, but
all variables are marked as being endemic by the special function
cox
. The effects of those predictor terms are specified by
beta
. The left hand side of the formula denotes the start
and stop columns in data
. This can be omitted, if data
inherits
from class "epidata"
in which case cbind(start, stop)
will be
used. The epidemic model component is specified by the arguments
f
and w
(and the associated coefficients alpha
).
If the epidemic model component is empty and infPeriod
always returns Inf
, then one actually simulates from a pure Cox model.
The simulation algorithm used is Ogata's modified thinning. For details, see Höhle (2009), Section 4.
An object of class "simEpidata"
, which is a data.frame
with the
columns "id"
, "start"
, "stop"
, "atRiskY"
,
"event"
, "Revent"
and the coordinate columns (with the original
names from data
), which are all obligatory. These columns are followed
by all the variables appearing on the rhs of the formula
. Last but not
least, the generated columns with epidemic covariates corresponding to the
functions in the lists f
and w
are appended.
Note that objects of class "simEpidata"
also inherit from class
"epidata"
, thus all "epidata"
methods can be
applied.
The data.frame
is given the additional attributes
numeric vector of infection time points (sorted chronologically).
numeric vector of length 2: c(min(start), max(stop))
.
numeric vector containing the column indices of the coordinate columns in the resulting data-frame.
this equals the argument f
.
this equals the argument w
.
a list with elements h0 = h0$exact
, beta
and alpha
.
the matched call.
the terms
object used.
Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.
Sebastian Meyer and Michael Höhle
The plot.epidata
and animate.epidata
methods
for plotting and animating (simulated) epidemic data, respectively.
The intensityplot.simEpidata
method for plotting paths of
infection intensities.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
## Generate a data frame containing a hypothetic population with 100 individuals
set.seed(1234)
n <- 100
pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y")))
pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2],
gender=sample(0:1, n, replace=TRUE),
I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious
start=rep(0,n), stop=rep(Inf,n))
## Simulate an SIR epidemic in this population
set.seed(123)
infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n)
epi <- simEpidata(
cbind(start,stop) ~ cox(gender), data = pop,
id = "id", I0.col = "I0col", coords.cols = c("x","y"),
beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1),
infPeriod = function(ids) infPeriods[ids],
##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30 # -> SIRS
)
## extract event times by id
head(summary(epi)$byID)
## Plot the numbers of susceptible, infectious and removed individuals
plot(epi)
## load the 1861 Hagelloch measles epidemic
data("hagelloch")
summary(hagelloch)
plot(hagelloch)
## fit a simplistic twinSIR model
fit <- twinSIR(~ household, data = hagelloch)
## simulate a new epidemic from the above model
## with simulation period = observation period, re-using observed infPeriods
sim1 <- simulate(fit, data = hagelloch)
plot(sim1)
## check if we find similar parameters in the simulated epidemic
fitsim1 <- update(fit, data = sim1)
cbind(base = coef(fit), new = coef(fitsim1))
if (surveillance.options("allExamples"))
{
## simulate only 10 days, using random infPeriods ~ Exp(0.1)
sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10,
infPeriod = function(ids) rexp(length(ids), rate = 0.1))
plot(sim2)
## simulate from a different model with manually specified parameters
set.seed(321)
simepi <- simEpidata(~ cox(AGE), data = hagelloch,
beta = c(0.1), h0 = -4, alpha = c(household = 0.05),
f = list(household = function(u) u == 0),
infPeriod = function(ids) rexp(length(ids), rate=1/8))
plot(simepi)
intensityplot(simepi)
## see if we correctly estimate the parameters
fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi)
cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi))
}