Simulation of a Self-Exciting Spatio-Temporal Point Process — twinstim

The function simEpidataCS simulates events of a self-exciting spatio-temporal point process of the "twinstim" class. Simulation works via Ogata's modified thinning of the conditional intensity as described in Meyer et al. (2012). Note that simulation is limited to the spatial and temporal range of stgrid.

The simulate method for objects of class "twinstim" simulates new epidemic data using the model and the parameter estimates of the fitted object.

Usage

simEpidataCS(endemic, epidemic, siaf, tiaf, qmatrix, rmarks,
    events, stgrid, tiles, beta0, beta, gamma, siafpars, tiafpars,
    epilink = "log", t0 = stgrid$start[1], T = tail(stgrid$stop,1),
    nEvents = 1e5, control.siaf = list(F=list(), Deriv=list()),
    W = NULL, trace = 5, nCircle2Poly = 32, gmax = NULL, .allocate = 500,
    .skipChecks = FALSE, .onlyEvents = FALSE)

# S3 method for class 'twinstim'
simulate(object, nsim = 1, seed = NULL, data, tiles,
    newcoef = NULL, rmarks = NULL, t0 = NULL, T = NULL, nEvents = 1e5,
    control.siaf = object$control.siaf,
    W = data$W, trace = FALSE, nCircle2Poly = NULL, gmax = NULL,
    .allocate = 500, simplify = TRUE, ...)

Arguments

endemic

see twinstim. Note that type-specific endemic intercepts are specified by beta0 here, not by the term (1|type).

epidemic

see twinstim. Marks appearing in this formula must be returned by the generating function rmarks.

siaf

see twinstim. In addition to what is required for fitting with twinstim, the siaf specification must also contain the element simulate, a function which draws random locations following the spatial kernel siaf$f. The first argument of the function is the number of points to sample (say n), the second one is the vector of parameters siafpars, the third one is the type indicator (a character string matching a type name as specified by dimnames(qmatrix)). With the current implementation there will always be simulated only one location at a time, i.e. n=1. The predefined siaf's all provide simulation.

tiaf

e.g. what is returned by the generating function tiaf.constant or tiaf.exponential. See also twinstim.

qmatrix

see epidataCS. Note that this square matrix and its dimnames determine the number and names of the different event types. In the simplest case, there is only a single type of event, i.e. qmatrix = diag(1).

rmarks

function of single time (1st argument) and location (2nd argument) returning a one-row data.frame of marks (named according to the variables in epidemic) for an event at this point. This must include the columns eps.s and eps.t, i.e. the values of the spatial and temporal interaction ranges at this point. Only "numeric" and "factor" columns are allowed. Assure that factor variables are coded equally (same levels and level order) for each new sample.

For the simulate.twinstim method, the default (NULL) means sampling from the empirical distribution function of the (non-missing) marks in data restricted to events in the simulation period (t0;T]. If there are no events in this period, e.g., if simulating beyond the original observation period, rmarks will sample marks from all of data$events.

events

NULL or missing (default) in case of an empty prehistory, or a SpatialPointsDataFrame containing events of the prehistory (-Inf;t0] of the process (required for the epidemic to start in case of no endemic component in the model). The SpatialPointsDataFrame must have the same proj4string as tiles and W). The attached data.frame (data slot) must contain the typical columns as described in as.epidataCS (time, tile, eps.t, eps.s, and, for type-specific models, type) and all marks appearing in the epidemic specification. Note that some column names are reserved (see as.epidataCS). Only events up to time t0 are selected and taken as the prehistory.

stgrid

see as.epidataCS. Simulation only works inside the spatial and temporal range of stgrid.

tiles

object inheriting from "SpatialPolygons" with row.names matching the tile names in stgrid and having the same proj4string as events and W. This is necessary to sample the spatial location of events generated by the endemic component.

beta0,beta,gamma,siafpars,tiafpars

these are the parameter subvectors of the twinstim. beta and gamma must be given in the same order as they appear in endemic and epidemic, respectively. beta0 is either a single endemic intercept or a vector of type-specific endemic intercepts in the same order as in qmatrix.

epilink

a character string determining the link function to be used for the epidemic linear predictor of event marks. By default, the log-link is used. The experimental alternative is epilink = "identity". Note that the identity link does not guarantee the force of infection to be positive. If this leads to a negative total intensity (endemic + epidemic), the point process is not well defined and simulation cannot proceed.

t0

events having occurred during (-Inf;t0] are regarded as part of the prehistory $H_0$ of the process. For simEpidataCS, by default and also if t0=NULL, the beginning of stgrid is used as t0. For the simulate.twinstim method, NULL means to use the fitted time range of the "twinstim" object.

T, nEvents

simulate a maximum of nEvents events up to time T, then stop. For simEpidataCS, by default, and also if T=NULL, T equals the last stop time in stgrid (it cannot be greater) and nEvents is bounded above by 10000. For the simulate.twinstim method, T=NULL means to use the same same time range as for the fitting of the "twinstim" object.

W

see as.epidataCS. When simulating from twinstim-fits, W is by default taken from the original data$W. If specified as NULL, W is generated automatically via unionSpatialPolygons(tiles). However, since the result of such a polygon operation should always be verified, it is recommended to do that in advance.
It is important that W and tiles cover the same region: on the one hand direct offspring is sampled in the spatial influence region of the parent event, i.e., in the intersection of W and a circle of radius the eps.s of the parent event, after which the corresponding tile is determined by overlay with tiles. On the other hand endemic events are sampled from tiles.

trace

logical (or integer) indicating if (or how often) the current simulation status should be cated. For the simulate.twinstim method, trace currently only applies to the first of the nsim simulations.

.allocate

number of rows (events) to initially allocate for the event history; defaults to 500. Each time the simulated epidemic exceeds the allocated space, the event data.frame will be enlarged by .allocate rows.

.skipChecks,.onlyEvents

these logical arguments are not meant to be set by the user. They are used by the simulate-method for "twinstim" objects.

object

an object of class "twinstim".

nsim

number of epidemics (i.e. spatio-temporal point patterns inheriting from class "epidataCS") to simulate. Defaults to 1 when the result is a simple object inheriting from class "simEpidataCS" (as if simEpidataCS would have been called directly). If nsim > 1, the result will be a list the structure of which depends on the argument simplify.

seed

an object specifying how the random number generator should be initialized for simulation (via set.seed). The initial state will also be stored as an attribute "seed" of the result. The original state of the .Random.seed will be restored at the end of the simulation. By default (NULL), neither initialization nor recovery will be done. This behaviour is copied from the simulate.lm method.

data

an object of class "epidataCS", usually the one to which the "twinstim" object was fitted. It carries the stgrid of the endemic component, but also events for use as the prehistory, and defaults for rmarks and nCircle2Poly.

newcoef

an optional named numeric vector of (a subset of) parameters to replace the original point estimates in coef(object). Elements which do not match any model parameter by name are silently ignored. The newcoefs may also be supplied in a list following the same conventions as for the start argument in twinstim.

simplify

logical. It is strongly recommended to set simplify = TRUE (default) if nsim is large. This saves space and computation time, because for each simulated epidemic only the events component is saved. All other components, which do not vary between simulations, are only stored from the first run. In this case, the runtime of each simulation is stored as an attribute "runtime" to each simulated events. See also the “Value” section below.

control.siaf

see twinstim.

nCircle2Poly

see as.epidataCS. For simulate.twinstim, NULL means to use the same value as for data.

gmax

maximum value the temporal interaction function tiaf$g can attain. If NULL, then it is assumed as the maximum value of the type-specific values at 0, i.e. max(tiaf$g(rep.int(0,nTypes), tiafpars, 1:nTypes)).

...

unused (arguments of the generic).

Value

The function simEpidataCS returns a simulated epidemic of class "simEpidataCS", which enhances the class "epidataCS" by the following additional components known from objects of class "twinstim": bbox, timeRange, formula, coefficients, npars, control.siaf, call, runtime. It has corresponding coeflist, residuals, R0, and intensityplot methods.

The simulate.twinstim method has some additional attributes set on its result: call, seed, and runtime. If nsim > 1, it returns an object of class "simEpidataCSlist", the form of which depends on the value of simplify (which is stored as an attribute simplified): if simplify = FALSE, then the return value is just a list of sequential simulations, each of class "simEpidataCS". However, if simplify = TRUE, then the sequential simulations share all components but the simulated events, i.e. the result is a list with the same components as a single object of class "simEpidataCS", but with events replaced by an eventsList containing the events returned by each of the simulations.

The stgrid component of the returned "simEpidataCS" will be truncated to the actual end of the simulation, which might be $<T$, if the upper bound nEvents is reached during simulation.

CAVE: Currently, simplify=TRUE in simulate.twinstim ignores that multiple simulated epidemics (nsim > 1) may have different stgrid time ranges. In a "simEpidataCSlist", the stgrid shared by all of the simulated epidemics is just the stgrid returned by the first simulation.

Note

The more detailed the polygons in tiles are the slower is the algorithm. You are advised to sacrifice some shape details for speed by reducing the polygon complexity, for example via the mapshaper JavaScript library wrapped by the R package rmapshaper, or via simplify.owin.

References

Douglas, D. H. and Peucker, T. K. (1973): Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122

Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x

Author

Sebastian Meyer, with contributions by Michael Höhle

Examples

data("imdepi", "imdepifit")

## simulation needs the district polygons ('tiles')
load(system.file("shapes", "districtsD.RData", package="surveillance"))
if (surveillance.options("allExamples")) {
  plot(districtsD)
  plot(stateD, add=TRUE, border=2, lwd=2)
}

### simulate from the fitted model, only over the first 90 days (for speed)
nsim <- 10

sims <- simulate(imdepifit, nsim=nsim, seed=1, data=imdepi, T=90,
                 tiles=districtsD, simplify=TRUE)
sims

## plot event times for a random selection of 4 simulations
plot(sims, aggregate="time")

## extract the second realization -> object of class "simEpidataCS"
sim2 <- sims[[2]]
summary(sim2)
plot(sim2, aggregate="space")

## extract _cumulative_ number of events (simulated vs. observed)
getcsums <- function (events) {
    tapply(events$time, events@data["type"],
           function (t) cumsum(table(t)), simplify=FALSE)
}
csums_observed <- getcsums(imdepi$events)
csums_simulated <- lapply(sims$eventsList, getcsums)

## plot it
plotcsums <- function (csums, ...) {
    mapply(function (csum, ...) lines(as.numeric(names(csum)), csum, ...),
           csums, ...)
    invisible()
}
plot(c(0,90), c(0,35), type="n", xlab="Time [days]",
     ylab="Cumulative number of cases")
plotcsums(csums_observed, col=c(2,4), lwd=3)
legend("topleft", legend=levels(imdepi$events$type), col=c(2,4), lwd=1)
invisible(lapply(csums_simulated, plotcsums,
                 col=adjustcolor(c(2,4), alpha.f=0.5)))


if (FALSE) { # \dontrun{
### Experimental code to generate 'nsim' simulations of 'nm2add' months
### beyond the observed time period:
nm2add <- 24
nsim <- 5
### The events still infective by the end of imdepi$stgrid will be used
### as the prehistory for the continued process.

origT <- tail(imdepi$stgrid$stop, 1)
## extend the 'stgrid' by replicating the last block 'nm2add' times
## (i.e., holding "popdensity" constant)
stgridext <- local({
    gLast <- subset(imdepi$stgrid, BLOCK == max(BLOCK))
    gAdd <- gLast[rep(1:nrow(gLast), nm2add),]; rownames(gAdd) <- NULL
    newstart <- seq(origT, by=30, length.out=nm2add)
    newstop <- c(newstart[-1], max(newstart) + 30)
    gAdd$start <- rep(newstart, each=nlevels(gAdd$tile))
    gAdd$stop <- rep(newstop, each=nlevels(gAdd$tile))
    rbind(imdepi$stgrid, gAdd, make.row.names = FALSE)[,-1]
})
## create an updated "epidataCS" with the time-extended 'stgrid'
imdepiext <- update(imdepi, stgrid = stgridext)
newT <- tail(imdepiext$stgrid$stop, 1)

## simulate beyond the original period
simsext <- simulate(imdepifit, nsim=nsim, seed=1, t0=origT, T=newT,
                    data=imdepiext, tiles=districtsD, simplify=TRUE)

## Aside to understand the note from checking events and tiles:
# marks(imdepi)["636",]  # tile 09662 is attributed to this event, but:
# plot(districtsD[c("09678","09662"),], border=1:2, lwd=2, axes=TRUE)
# points(imdepi$events["636",])
## this mismatch is due to polygon simplification

## plot the observed and simulated event numbers over time
plot(imdepiext, breaks=c(unique(imdepi$stgrid$start),origT),
     cumulative=list(maxat=330))
for (i in seq_along(simsext$eventsList))
    plot(simsext[[i]], add=TRUE, legend.types=FALSE,
         breaks=c(unique(simsext$stgrid$start),newT),
         subset=!is.na(source),  # have to exclude the events of the prehistory
         cumulative=list(offset=c(table(imdepi$events$type)), maxat=330, axis=FALSE),
         border=NA, density=0)  # no histogram
abline(v=origT, lty=2, lwd=2)

} # }