twinstim_simulation.Rd
The function simEpidataCS
simulates events of a self-exciting
spatio-temporal point process of the "twinstim"
class.
Simulation works via Ogata's modified thinning of the conditional
intensity as described in Meyer et al. (2012). Note that simulation is
limited to the spatial and temporal range of stgrid
.
The simulate
method for objects of class
"twinstim"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
simEpidataCS(endemic, epidemic, siaf, tiaf, qmatrix, rmarks,
events, stgrid, tiles, beta0, beta, gamma, siafpars, tiafpars,
epilink = "log", t0 = stgrid$start[1], T = tail(stgrid$stop,1),
nEvents = 1e5, control.siaf = list(F=list(), Deriv=list()),
W = NULL, trace = 5, nCircle2Poly = 32, gmax = NULL, .allocate = 500,
.skipChecks = FALSE, .onlyEvents = FALSE)
# S3 method for twinstim
simulate(object, nsim = 1, seed = NULL, data, tiles,
newcoef = NULL, rmarks = NULL, t0 = NULL, T = NULL, nEvents = 1e5,
control.siaf = object$control.siaf,
W = data$W, trace = FALSE, nCircle2Poly = NULL, gmax = NULL,
.allocate = 500, simplify = TRUE, ...)
see twinstim
. Note that type-specific endemic
intercepts are specified by beta0
here, not by the term
(1|type)
.
see twinstim
. Marks appearing in this formula must
be returned by the generating function rmarks
.
see twinstim
.
In addition to what is required for fitting with twinstim
,
the siaf
specification must also contain the element
simulate
, a function which draws random locations following the
spatial kernel siaf$f
. The first argument of the function is the
number of points to sample (say n
),
the second one is the vector of parameters
siafpars
, the third one is the type indicator (a character string
matching a type name as specified by dimnames(qmatrix)
). With the
current implementation there will always be simulated only one
location at a time, i.e. n=1
.
The predefined siaf's all provide simulation.
e.g. what is returned by the generating function
tiaf.constant
or tiaf.exponential
. See also
twinstim
.
see epidataCS
. Note that this square
matrix and its dimnames
determine the number and names of the
different event types. In the simplest case, there is only a single
type of event, i.e. qmatrix = diag(1)
.
function of single time (1st argument) and location
(2nd argument) returning a one-row data.frame
of marks (named
according to the variables in epidemic
) for an event at this
point. This must include the columns eps.s
and eps.t
,
i.e. the values of the spatial and temporal interaction ranges at this
point. Only "numeric"
and "factor"
columns are
allowed. Assure that factor variables are coded equally
(same levels and level order) for each new sample.
For the simulate.twinstim
method, the default (NULL
)
means sampling from the empirical distribution function of the
(non-missing) marks in data
restricted to events in the
simulation period (t0
;T
]. If there are no events in
this period, e.g., if simulating beyond the original observation
period, rmarks
will sample marks from all of
data$events
.
NULL
or missing (default) in case of an empty prehistory,
or a SpatialPointsDataFrame
containing events of the
prehistory (-Inf;t0
] of the process (required for the
epidemic to start in case of no endemic component in the model).
The SpatialPointsDataFrame
must have the same
proj4string
as tiles
and W
). The attached
data.frame
(data slot) must contain the typical
columns as described in as.epidataCS
(time
,
tile
, eps.t
, eps.s
, and, for type-specific
models, type
) and all marks appearing in the epidemic
specification. Note that some column names are reserved (see
as.epidataCS
). Only events up to
time t0
are selected and taken as the prehistory.
see as.epidataCS
. Simulation only works inside the spatial
and temporal range of stgrid
.
object inheriting from "SpatialPolygons"
with
row.names
matching the tile
names in stgrid
and
having the same proj4string
as events
and W
. This is necessary
to sample the spatial location of events generated by the endemic component.
these are the parameter subvectors of the twinstim
.
beta
and gamma
must be given in the
same order as they appear in endemic
and epidemic
,
respectively. beta0
is either a single endemic intercept or a
vector of type-specific endemic intercepts in the same order as in
qmatrix
.
a character string determining the link function to be used for the
epidemic
linear predictor of event marks. By default, the
log-link is used. The experimental alternative is
epilink = "identity"
. Note that the identity link does not
guarantee the force of infection to be positive. If this leads to a
negative total intensity (endemic + epidemic), the point process is
not well defined and simulation cannot proceed.
events
having occurred during (-Inf;t0
] are regarded
as part of the prehistory \(H_0\) of the process.
For simEpidataCS
, by default and also if t0=NULL
,
the beginning of stgrid
is used as t0
.
For the simulate.twinstim
method, NULL
means to use
the fitted time range of the "twinstim"
object
.
simulate a maximum of nEvents
events up to time T
,
then stop. For simEpidataCS
, by default, and also if
T=NULL
, T
equals the last stop time in stgrid
(it cannot be greater) and nEvents
is bounded above by 10000.
For the simulate.twinstim
method, T=NULL
means to use
the same same time range as for the fitting of the "twinstim"
object
.
see as.epidataCS
. When simulating from
twinstim
-fits, W
is by default taken from the original
data$W
. If specified as NULL
, W
is generated
automatically via unionSpatialPolygons(tiles)
.
However, since the result of such a polygon operation should always
be verified, it is recommended to do that in advance.
It is important that W
and tiles
cover the same region:
on the one hand direct offspring is sampled
in the spatial influence region of the parent event, i.e., in the
intersection of W
and a circle of radius the eps.s
of the
parent event, after which the corresponding tile is determined by
overlay with tiles
. On the other hand endemic events are
sampled from tiles
.
logical (or integer) indicating if (or how often) the current
simulation status should be cat
ed. For the
simulate.twinstim
method, trace
currently only applies
to the first of the nsim
simulations.
number of rows (events) to initially allocate for the event history;
defaults to 500. Each time the simulated epidemic exceeds the
allocated space, the event data.frame
will be enlarged by
.allocate
rows.
these logical arguments are not meant to be set by the user.
They are used by the simulate
-method for "twinstim"
objects.
an object of class "twinstim"
.
number of epidemics (i.e. spatio-temporal point patterns inheriting
from class "epidataCS"
) to simulate. Defaults to 1 when the
result is a simple object inheriting from class
"simEpidataCS"
(as if simEpidataCS
would have been
called directly). If nsim > 1
, the result
will be a list the structure of which depends on the argument
simplify
.
an object specifying how the random number generator should be
initialized for simulation (via set.seed
). The
initial state will also be stored as an attribute "seed"
of
the result. The original state of the .Random.seed
will be restored at the end of the simulation.
By default (NULL
), neither initialization nor recovery will
be done.
This behaviour is copied from the simulate.lm
method.
an object of class "epidataCS"
, usually the one to which the
"twinstim"
object
was fitted. It carries
the stgrid
of the endemic component, but also
events
for use as the prehistory, and defaults for
rmarks
and nCircle2Poly
.
an optional named numeric vector of (a subset of) parameters to
replace the original point estimates in coef(object)
.
Elements which do not match any model parameter by name are silently
ignored. The newcoef
s may also be supplied in a list
following the same conventions as for the start
argument in
twinstim
.
logical. It is strongly recommended to set simplify = TRUE
(default) if nsim
is large. This saves space and computation time,
because for each simulated epidemic only the events
component is
saved. All other components, which do not vary between simulations,
are only stored from the first run. In this case, the runtime of each
simulation is stored as an attribute "runtime"
to each simulated
events
. See also the “Value” section below.
see twinstim
.
see as.epidataCS
. For
simulate.twinstim
, NULL
means to use the same value as
for data
.
maximum value the temporal interaction function
tiaf$g
can attain. If NULL
, then it is assumed as the
maximum value of the type-specific values at 0, i.e.
max(tiaf$g(rep.int(0,nTypes), tiafpars, 1:nTypes))
.
unused (arguments of the generic).
The function simEpidataCS
returns a simulated epidemic of class
"simEpidataCS"
, which enhances the class
"epidataCS"
by the following additional components known from
objects of class "twinstim"
:
bbox
, timeRange
, formula
, coefficients
,
npars
, control.siaf
, call
, runtime
.
It has corresponding coeflist
,
residuals
,
R0
, and
intensityplot
methods.
The simulate.twinstim
method has some additional
attributes set on its result:
call
, seed
, and runtime
.
If nsim > 1
, it returns an object of class
"simEpidataCSlist"
, the form of which depends on the value of
simplify
(which is stored as an attribute simplified
):
if simplify = FALSE
, then the return value is
just a list of sequential simulations, each of class
"simEpidataCS"
. However, if simplify = TRUE
, then the
sequential simulations share all components but the simulated
events
, i.e. the result is a list with the same components as
a single object of class "simEpidataCS"
, but with events
replaced by an eventsList
containing the events
returned
by each of the simulations.
The stgrid
component of the returned "simEpidataCS"
will be truncated to the actual end of the simulation, which might
be \(<T\), if the upper bound nEvents
is reached during
simulation.
CAVE: Currently, simplify=TRUE
in simulate.twinstim
ignores that multiple simulated epidemics
(nsim > 1
) may have different stgrid
time ranges. In a "simEpidataCSlist"
, the stgrid
shared
by all of the simulated epidemics is just the stgrid
returned by the first simulation.
The more detailed the polygons in tiles
are the slower is
the algorithm. You are advised to sacrifice some shape
details for speed by reducing the polygon complexity,
for example via the mapshaper
JavaScript library wrapped by
the R package rmapshaper.
Alternative tools are provided by the packages maptools
(thinnedSpatialPoly
) and spatstat.geom
(simplify.owin
).
Douglas, D. H. and Peucker, T. K. (1973): Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi: 10.1111/j.1541-0420.2011.01684.x
Sebastian Meyer, with contributions by Michael Höhle
The function simEndemicEvents
is a faster alternative
for endemic-only models, only returning a
"SpatialPointsDataFrame"
of simulated events.
The plot.epidataCS
and animate.epidataCS
methods for plotting and animating continuous-space epidemic data,
respectively, also work for simulated epidemics (by inheritance),
and twinstim
can be used to fit
spatio-temporal conditional intensity models also to simulated data.
data("imdepi", "imdepifit")
## load borders of Germany's districts (originally obtained from
## the German Federal Agency for Cartography and Geodesy,
## https://gdz.bkg.bund.de/), simplified by the "modified Visvalingam"
## algorithm (level=6.6%) using MapShaper.org (v. 0.1.17):
load(system.file("shapes", "districtsD.RData", package="surveillance"))
plot(districtsD)
plot(stateD, add=TRUE, border=2, lwd=2)
# 'stateD' was obtained as 'rgeos::gUnaryUnion(districtsD)'
## simulate 2 realizations (over a short period, for speed)
## considering events from data(imdepi) before t=31 as prehistory
mysims <- simulate(imdepifit, nsim=2, seed=1, data=imdepi,
tiles=districtsD, newcoef=c("e.typeC"=-1),
t0=31, T=if (interactive()) 180 else 45, # for CRAN
simplify=TRUE)
# \dontshow{
## check construction and selection from "simEpidataCSlist"
local({
mysim_from_list <- mysims[[1]]
capture.output(mysim_single <- eval("[[<-"(attr(mysims, "call"), "nsim", 1)))
mysim_from_list$runtime <- mysim_single$runtime <- NULL
stopifnot(all.equal(mysim_single, mysim_from_list,
check.attributes = FALSE))
})
## check equivalence of Lambdag from simulation and residuals via twinstim
stopifnot(all.equal(
residuals(mysims[[1]]),
suppressMessages(surveillance:::residuals.twinstim(surveillance:::as.twinstim.simEpidataCS(mysims[[1]])))
))
# }
## plot both simulations using the plot-method for simEpidataCSlist's
mysims
plot(mysims, aggregate="time")
## extract the second realization -> object of class simEpidataCS
mysim2 <- mysims[[2]]
summary(mysim2)
plot(mysim2, aggregate="space")
if (surveillance.options("allExamples")) {
### compare the observed _cumulative_ number of cases during the
### first 90 days to 20 simulations from the fitted model
### (performing these simulations takes about 30 seconds)
sims <- simulate(imdepifit, nsim=20, seed=1, data=imdepi, t0=0, T=90,
tiles=districtsD, simplify=TRUE)
## extract cusums
getcsums <- function (events) {
tapply(events$time, events@data["type"],
function (t) cumsum(table(t)), simplify=FALSE)
}
csums_observed <- getcsums(imdepi$events)
csums_simulated <- lapply(sims$eventsList, getcsums)
## plot it
plotcsums <- function (csums, ...) {
mapply(function (csum, ...) lines(as.numeric(names(csum)), csum, ...),
csums, ...)
invisible()
}
plot(c(0,90), c(0,35), type="n", xlab="Time [days]",
ylab="Cumulative number of cases")
plotcsums(csums_observed, col=c(2,4), lwd=3)
legend("topleft", legend=levels(imdepi$events$type), col=c(2,4), lwd=1)
invisible(lapply(csums_simulated, plotcsums,
col=adjustcolor(c(2,4), alpha=0.5)))
}
if (FALSE) {
### Experimental code to generate 'nsim' simulations of 'nm2add' months
### beyond the observed time period:
nm2add <- 24
nsim <- 5
### With these settings, simulations will take about 30 seconds.
### The events still infective by the end of imdepi$stgrid will be used
### as the prehistory for the continued process.
origT <- tail(imdepi$stgrid$stop, 1)
## create a time-extended version of imdepi
imdepiext <- local({
## first we have to expand stgrid (assuming constant "popdensity")
g <- imdepi$stgrid
g$stop <- g$BLOCK <- NULL
gadd <- data.frame(start=rep(seq(origT, by=30, length.out=nm2add),
each=nlevels(g$tile)),
g[rep(seq_len(nlevels(g$tile)), nm2add), -1])
## now create an "epidataCS" using this time-extended stgrid
as.epidataCS(events=imdepi$events, # the replacement warnings are ok
W=imdepi$W, qmatrix=imdepi$qmatrix,
stgrid=rbind(g, gadd), T=max(gadd$start) + 30)
})
newT <- tail(imdepiext$stgrid$stop, 1)
## simulate beyond the original period
simsext <- simulate(imdepifit, nsim=nsim, seed=1, t0=origT, T=newT,
data=imdepiext, tiles=districtsD, simplify=TRUE)
## Aside to understand the note from checking events and tiles:
# marks(imdepi)["636",] # tile 09662 is attributed to this event, but:
# plot(districtsD[c("09678","09662"),], border=1:2, lwd=2, axes=TRUE)
# points(imdepi$events["636",])
## this mismatch is due to polygon simplification
## plot the observed and simulated event numbers over time
plot(imdepiext, breaks=c(unique(imdepi$stgrid$start),origT),
cumulative=list(maxat=330))
for (i in seq_along(simsext$eventsList))
plot(simsext[[i]], add=TRUE, legend.types=FALSE,
breaks=c(unique(simsext$stgrid$start),newT),
subset=!is.na(source), # have to exclude the events of the prehistory
cumulative=list(offset=c(table(imdepi$events$type)), maxat=330, axis=FALSE),
border=NA, density=0) # no histogram
abline(v=origT, lty=2, lwd=2)
}