Simulation of a Self-Exciting Spatio-Temporal Point Process
twinstim_simulation.Rd
The function simEpidataCS
simulates events of a self-exciting
spatio-temporal point process of the "twinstim"
class.
Simulation works via Ogata's modified thinning of the conditional
intensity as described in Meyer et al. (2012). Note that simulation is
limited to the spatial and temporal range of stgrid
.
The simulate
method for objects of class
"twinstim"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
Usage
simEpidataCS(endemic, epidemic, siaf, tiaf, qmatrix, rmarks,
events, stgrid, tiles, beta0, beta, gamma, siafpars, tiafpars,
epilink = "log", t0 = stgrid$start[1], T = tail(stgrid$stop,1),
nEvents = 1e5, control.siaf = list(F=list(), Deriv=list()),
W = NULL, trace = 5, nCircle2Poly = 32, gmax = NULL, .allocate = 500,
.skipChecks = FALSE, .onlyEvents = FALSE)
# S3 method for class 'twinstim'
simulate(object, nsim = 1, seed = NULL, data, tiles,
newcoef = NULL, rmarks = NULL, t0 = NULL, T = NULL, nEvents = 1e5,
control.siaf = object$control.siaf,
W = data$W, trace = FALSE, nCircle2Poly = NULL, gmax = NULL,
.allocate = 500, simplify = TRUE, ...)
Arguments
- endemic
see
twinstim
. Note that type-specific endemic intercepts are specified bybeta0
here, not by the term(1|type)
.- epidemic
see
twinstim
. Marks appearing in this formula must be returned by the generating functionrmarks
.- siaf
see
twinstim
. In addition to what is required for fitting withtwinstim
, thesiaf
specification must also contain the elementsimulate
, a function which draws random locations following the spatial kernelsiaf$f
. The first argument of the function is the number of points to sample (sayn
), the second one is the vector of parameterssiafpars
, the third one is the type indicator (a character string matching a type name as specified bydimnames(qmatrix)
). With the current implementation there will always be simulated only one location at a time, i.e.n=1
. The predefined siaf's all provide simulation.- tiaf
e.g. what is returned by the generating function
tiaf.constant
ortiaf.exponential
. See alsotwinstim
.- qmatrix
see
epidataCS
. Note that this square matrix and itsdimnames
determine the number and names of the different event types. In the simplest case, there is only a single type of event, i.e.qmatrix = diag(1)
.- rmarks
function of single time (1st argument) and location (2nd argument) returning a one-row
data.frame
of marks (named according to the variables inepidemic
) for an event at this point. This must include the columnseps.s
andeps.t
, i.e. the values of the spatial and temporal interaction ranges at this point. Only"numeric"
and"factor"
columns are allowed. Assure that factor variables are coded equally (same levels and level order) for each new sample.For the
simulate.twinstim
method, the default (NULL
) means sampling from the empirical distribution function of the (non-missing) marks indata
restricted to events in the simulation period (t0
;T
]. If there are no events in this period, e.g., if simulating beyond the original observation period,rmarks
will sample marks from all ofdata$events
.- events
NULL
or missing (default) in case of an empty prehistory, or aSpatialPointsDataFrame
containing events of the prehistory (-Inf;t0
] of the process (required for the epidemic to start in case of no endemic component in the model). TheSpatialPointsDataFrame
must have the sameproj4string
astiles
andW
). The attacheddata.frame
(data slot) must contain the typical columns as described inas.epidataCS
(time
,tile
,eps.t
,eps.s
, and, for type-specific models,type
) and all marks appearing in theepidemic
specification. Note that some column names are reserved (seeas.epidataCS
). Only events up to timet0
are selected and taken as the prehistory.- stgrid
see
as.epidataCS
. Simulation only works inside the spatial and temporal range ofstgrid
.- tiles
object inheriting from
"SpatialPolygons"
withrow.names
matching thetile
names instgrid
and having the sameproj4string
asevents
andW
. This is necessary to sample the spatial location of events generated by the endemic component.- beta0,beta,gamma,siafpars,tiafpars
these are the parameter subvectors of the
twinstim
.beta
andgamma
must be given in the same order as they appear inendemic
andepidemic
, respectively.beta0
is either a single endemic intercept or a vector of type-specific endemic intercepts in the same order as inqmatrix
.- epilink
a character string determining the link function to be used for the
epidemic
linear predictor of event marks. By default, the log-link is used. The experimental alternative isepilink = "identity"
. Note that the identity link does not guarantee the force of infection to be positive. If this leads to a negative total intensity (endemic + epidemic), the point process is not well defined and simulation cannot proceed.- t0
events
having occurred during (-Inf;t0
] are regarded as part of the prehistory \(H_0\) of the process. ForsimEpidataCS
, by default and also ift0=NULL
, the beginning ofstgrid
is used ast0
. For thesimulate.twinstim
method,NULL
means to use the fitted time range of the"twinstim"
object
.- T, nEvents
simulate a maximum of
nEvents
events up to timeT
, then stop. ForsimEpidataCS
, by default, and also ifT=NULL
,T
equals the last stop time instgrid
(it cannot be greater) andnEvents
is bounded above by 10000. For thesimulate.twinstim
method,T=NULL
means to use the same same time range as for the fitting of the"twinstim"
object
.- W
see
as.epidataCS
. When simulating fromtwinstim
-fits,W
is by default taken from the originaldata$W
. If specified asNULL
,W
is generated automatically viaunionSpatialPolygons(tiles)
. However, since the result of such a polygon operation should always be verified, it is recommended to do that in advance.
It is important thatW
andtiles
cover the same region: on the one hand direct offspring is sampled in the spatial influence region of the parent event, i.e., in the intersection ofW
and a circle of radius theeps.s
of the parent event, after which the corresponding tile is determined by overlay withtiles
. On the other hand endemic events are sampled fromtiles
.- trace
logical (or integer) indicating if (or how often) the current simulation status should be
cat
ed. For thesimulate.twinstim
method,trace
currently only applies to the first of thensim
simulations.- .allocate
number of rows (events) to initially allocate for the event history; defaults to 500. Each time the simulated epidemic exceeds the allocated space, the event
data.frame
will be enlarged by.allocate
rows.- .skipChecks,.onlyEvents
these logical arguments are not meant to be set by the user. They are used by the
simulate
-method for"twinstim"
objects.- object
an object of class
"twinstim"
.- nsim
number of epidemics (i.e. spatio-temporal point patterns inheriting from class
"epidataCS"
) to simulate. Defaults to 1 when the result is a simple object inheriting from class"simEpidataCS"
(as ifsimEpidataCS
would have been called directly). Ifnsim > 1
, the result will be a list the structure of which depends on the argumentsimplify
.- seed
an object specifying how the random number generator should be initialized for simulation (via
set.seed
). The initial state will also be stored as an attribute"seed"
of the result. The original state of the.Random.seed
will be restored at the end of the simulation. By default (NULL
), neither initialization nor recovery will be done. This behaviour is copied from thesimulate.lm
method.- data
an object of class
"epidataCS"
, usually the one to which the"twinstim"
object
was fitted. It carries thestgrid
of the endemic component, but alsoevents
for use as the prehistory, and defaults forrmarks
andnCircle2Poly
.- newcoef
an optional named numeric vector of (a subset of) parameters to replace the original point estimates in
coef(object)
. Elements which do not match any model parameter by name are silently ignored. Thenewcoef
s may also be supplied in a list following the same conventions as for thestart
argument intwinstim
.- simplify
logical. It is strongly recommended to set
simplify = TRUE
(default) ifnsim
is large. This saves space and computation time, because for each simulated epidemic only theevents
component is saved. All other components, which do not vary between simulations, are only stored from the first run. In this case, the runtime of each simulation is stored as an attribute"runtime"
to each simulatedevents
. See also the “Value” section below.- control.siaf
see
twinstim
.- nCircle2Poly
see
as.epidataCS
. Forsimulate.twinstim
,NULL
means to use the same value as fordata
.- gmax
maximum value the temporal interaction function
tiaf$g
can attain. IfNULL
, then it is assumed as the maximum value of the type-specific values at 0, i.e.max(tiaf$g(rep.int(0,nTypes), tiafpars, 1:nTypes))
.- ...
unused (arguments of the generic).
Value
The function simEpidataCS
returns a simulated epidemic of class
"simEpidataCS"
, which enhances the class
"epidataCS"
by the following additional components known from
objects of class "twinstim"
:
bbox
, timeRange
, formula
, coefficients
,
npars
, control.siaf
, call
, runtime
.
It has corresponding coeflist
,
residuals
,
R0
, and
intensityplot
methods.
The simulate.twinstim
method has some additional
attributes set on its result:
call
, seed
, and runtime
.
If nsim > 1
, it returns an object of class
"simEpidataCSlist"
, the form of which depends on the value of
simplify
(which is stored as an attribute simplified
):
if simplify = FALSE
, then the return value is
just a list of sequential simulations, each of class
"simEpidataCS"
. However, if simplify = TRUE
, then the
sequential simulations share all components but the simulated
events
, i.e. the result is a list with the same components as
a single object of class "simEpidataCS"
, but with events
replaced by an eventsList
containing the events
returned
by each of the simulations.
The stgrid
component of the returned "simEpidataCS"
will be truncated to the actual end of the simulation, which might
be \(<T\), if the upper bound nEvents
is reached during
simulation.
CAVE: Currently, simplify=TRUE
in simulate.twinstim
ignores that multiple simulated epidemics
(nsim > 1
) may have different stgrid
time ranges. In a "simEpidataCSlist"
, the stgrid
shared
by all of the simulated epidemics is just the stgrid
returned by the first simulation.
Note
The more detailed the polygons in tiles
are the slower is
the algorithm. You are advised to sacrifice some shape
details for speed by reducing the polygon complexity,
for example via the mapshaper
JavaScript library wrapped by
the R package rmapshaper, or via
simplify.owin
.
References
Douglas, D. H. and Peucker, T. K. (1973): Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
See also
The function simEndemicEvents
is a faster alternative
for endemic-only models, only returning a
"SpatialPointsDataFrame"
of simulated events.
The plot.epidataCS
and animate.epidataCS
methods for plotting and animating continuous-space epidemic data,
respectively, also work for simulated epidemics (by inheritance),
and twinstim
can be used to fit
spatio-temporal conditional intensity models also to simulated data.
Examples
data("imdepi", "imdepifit")
## load borders of Germany's districts (originally obtained from
## the German Federal Agency for Cartography and Geodesy,
## https://gdz.bkg.bund.de/), simplified by the "modified Visvalingam"
## algorithm (level=6.6%) using MapShaper.org (v. 0.1.17):
load(system.file("shapes", "districtsD.RData", package="surveillance"))
if (surveillance.options("allExamples")) {
plot(districtsD)
plot(stateD, add=TRUE, border=2, lwd=2)
}
## simulate 2 realizations (over a short period, for speed)
## considering events from data(imdepi) before t=31 as prehistory
## IGNORE_RDIFF_BEGIN
mysims <- simulate(imdepifit, nsim=2, seed=1, data=imdepi,
tiles=districtsD, newcoef=c("e.typeC"=-1),
t0=31, T=if (interactive()) 180 else 45, # for CRAN
simplify=TRUE)
## IGNORE_RDIFF_END
## plot both simulations using the plot-method for simEpidataCSlist's
mysims
plot(mysims, aggregate="time")
## extract the second realization -> object of class simEpidataCS
mysim2 <- mysims[[2]]
summary(mysim2)
plot(mysim2, aggregate="space")
### compare the observed _cumulative_ number of cases in the first 90 days to
nsim <- 20
### simulations from the fitted model
sims <- simulate(imdepifit, nsim=nsim, seed=1, data=imdepi, t0=0, T=90,
tiles=districtsD, simplify=TRUE)
## extract cusums
getcsums <- function (events) {
tapply(events$time, events@data["type"],
function (t) cumsum(table(t)), simplify=FALSE)
}
csums_observed <- getcsums(imdepi$events)
csums_simulated <- lapply(sims$eventsList, getcsums)
## plot it
plotcsums <- function (csums, ...) {
mapply(function (csum, ...) lines(as.numeric(names(csum)), csum, ...),
csums, ...)
invisible()
}
plot(c(0,90), c(0,35), type="n", xlab="Time [days]",
ylab="Cumulative number of cases")
plotcsums(csums_observed, col=c(2,4), lwd=3)
legend("topleft", legend=levels(imdepi$events$type), col=c(2,4), lwd=1)
invisible(lapply(csums_simulated, plotcsums,
col=adjustcolor(c(2,4), alpha.f=0.5)))
if (FALSE) { # \dontrun{
### Experimental code to generate 'nsim' simulations of 'nm2add' months
### beyond the observed time period:
nm2add <- 24
nsim <- 5
### The events still infective by the end of imdepi$stgrid will be used
### as the prehistory for the continued process.
origT <- tail(imdepi$stgrid$stop, 1)
## extend the 'stgrid' by replicating the last block 'nm2add' times
## (i.e., holding "popdensity" constant)
stgridext <- local({
gLast <- subset(imdepi$stgrid, BLOCK == max(BLOCK))
gAdd <- gLast[rep(1:nrow(gLast), nm2add),]; rownames(gAdd) <- NULL
newstart <- seq(origT, by=30, length.out=nm2add)
newstop <- c(newstart[-1], max(newstart) + 30)
gAdd$start <- rep(newstart, each=nlevels(gAdd$tile))
gAdd$stop <- rep(newstop, each=nlevels(gAdd$tile))
rbind(imdepi$stgrid, gAdd, make.row.names = FALSE)[,-1]
})
## create an updated "epidataCS" with the time-extended 'stgrid'
imdepiext <- update(imdepi, stgrid = stgridext)
newT <- tail(imdepiext$stgrid$stop, 1)
## simulate beyond the original period
simsext <- simulate(imdepifit, nsim=nsim, seed=1, t0=origT, T=newT,
data=imdepiext, tiles=districtsD, simplify=TRUE)
## Aside to understand the note from checking events and tiles:
# marks(imdepi)["636",] # tile 09662 is attributed to this event, but:
# plot(districtsD[c("09678","09662"),], border=1:2, lwd=2, axes=TRUE)
# points(imdepi$events["636",])
## this mismatch is due to polygon simplification
## plot the observed and simulated event numbers over time
plot(imdepiext, breaks=c(unique(imdepi$stgrid$start),origT),
cumulative=list(maxat=330))
for (i in seq_along(simsext$eventsList))
plot(simsext[[i]], add=TRUE, legend.types=FALSE,
breaks=c(unique(simsext$stgrid$start),newT),
subset=!is.na(source), # have to exclude the events of the prehistory
cumulative=list(offset=c(table(imdepi$events$type)), maxat=330, axis=FALSE),
border=NA, density=0) # no histogram
abline(v=origT, lty=2, lwd=2)
} # }