epidataCS.Rd
Data structure for continuous spatio-temporal event
data, e.g. individual case reports of an infectious disease.
Apart from the actual events
, the class simultaneously
holds a spatio-temporal grid of endemic covariates (similar to
disease mapping) and a representation of the observation region.
The "epidataCS"
class is the basis for fitting
spatio-temporal endemic-epidemic intensity models with the function
twinstim
(Meyer et al., 2012).
The implementation is described in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes),
nCircle2Poly = 32L, T = NULL,
clipper = c("polyclip", "rgeos"), verbose = interactive())
# S3 method for epidataCS
print(x, n = 6L, digits = getOption("digits"), ...)
# S3 method for epidataCS
nobs(object, ...)
# S3 method for epidataCS
head(x, n = 6L, ...)
# S3 method for epidataCS
tail(x, n = 6L, ...)
# S3 method for epidataCS
[(x, i, j, ..., drop = TRUE)
# S3 method for epidataCS
subset(x, subset, select, drop = TRUE, ...)
# S3 method for epidataCS
marks(x, coords = TRUE, ...)
# S3 method for epidataCS
summary(object, ...)
# S3 method for summary.epidataCS
print(x, ...)
# S3 method for epidataCS
as.stepfun(x, ...)
getSourceDists(object, dimension = c("space", "time"))
a "SpatialPointsDataFrame"
of cases with the
following obligatory columns (in the events@data
data.frame
):
time point of event. Will be converted to a numeric
variable by as.numeric
. There should be no concurrent
events (but see untie
for an ex post adjustment)
and there cannot be events beyond stgrid
(i.e., time<=T
is required). Events at or before time
\(t_0\) = min(stgrid$start)
are allowed and form the
prehistory of the process.
the spatial region (tile) where the event is located.
This links to the tiles of stgrid
.
optional type of event in a marked twinstim
model. Will be converted to a factor variable dropping unused
levels. If missing, all events will be attribute the single type
"1"
.
maximum temporal influence radius (e.g. length of
infectious period, time to culling, etc.); must be positive and
may be Inf
.
maximum spatial influence radius (e.g. 100 [km]);
must be positive and may be Inf
. A compact influence
region mainly has computational advantages, but might also be
plausible for specific applications.
The data.frame
may contain columns with further marks of
the events, e.g. sex, age of infected individuals, which may
be used as epidemic covariates influencing infectiousness.
Note that some auxiliary columns will be added at conversion
whose names are reserved: ".obsInfLength"
,
".bdist"
, ".influenceRegion"
, and ".sources"
,
as well as "start"
, "BLOCK"
, and all endemic
covariates' names from stgrid
.
a data.frame
describing endemic covariates on a full
spatio-temporal region x interval grid (e.g., district x week),
which is a decomposition of the observation region W
and
period \(t_0,T\). This means that for every combination of spatial
region and time interval there must be exactly one row in this
data.frame
, that the union of the spatial tiles equals
W
, the union of the time intervals equals \(t_0,T\), and
that regions (and intervals) are non-overlapping.
There are the following obligatory columns:
ID of the spatial region (e.g., district ID). It will be converted to a factor variable (dropping unused levels if it already was one).
columns describing the consecutive temporal
intervals (converted to numeric variables by as.numeric
).
The start
time of an interval must be equal to the
stop
time of the previous interval. The stop
column may
be missing, in which case it will be auto-generated from the set
of start
values and T
.
area of the spatial region (tile
).
Be aware that the unit of this area (e.g., square km) must be consistent
with the units of W
and events
(as specified in
their proj4string
s).
The remaining columns are endemic covariates.
Note that the column name "BLOCK"
is reserved
(a column which will be added automatically for indexing the time
intervals of stgrid
).
an object of class "SpatialPolygons"
representing the observation region.
It must have the same proj4string
as events
and all events must be within W
.
Prior simplification of W
may considerably reduce the
computational burden of likelihood evaluations in
twinstim
models with non-trivial spatial
interaction functions (see the “Note” section below).
a square indicator matrix (0/1 or FALSE
/TRUE
) for possible
transmission between the event types. The matrix will be internally
converted to logical
. Defaults to an independent spread of the event
types, i.e. the identity matrix.
accuracy (number of edges) of the polygonal approximation of a circle,
see discpoly
.
end of observation period (i.e. last stop
time of
stgrid
). Must be specified if the start but not the stop
times are supplied in stgrid
(=> auto-generation of
stop
times).
polygon clipping engine to use for calculating the
.influenceRegion
s of events (see the Value section below).
Default is the polyclip package (called via
intersect.owin
from package spatstat.geom).
In surveillance <= 1.6-0, package gpclib was used, which
has a restrictive license. This is no longer supported.
logical indicating if status messages should be printed
during input checking and "epidataCS"
generation. The default
is to do so in interactive R sessions.
an object of class "epidataCS"
or
"summary.epidataCS"
, respectively.
a single integer. If positive, the first (head
, print
)
/ last (tail
) n
events are extracted. If negative,
all but the n
first/last events are extracted.
minimum number of significant digits to be printed in values.
arguments passed to the
[-method
for
SpatialPointDataFrame
s for subsetting the events
while
retaining stgrid
and W
.
If drop=TRUE
(the default), event types that completely
disappear due to i
-subsetting will be dropped, which reduces
qmatrix
and the factor levels of the type
column.
By the j
index, epidemic covariates can be removed from
events
.
unused (arguments of the generics) with a few exceptions:
The print
method for "epidataCS"
passes
...
to the print.data.frame
method, and the
print
method for "summary.epidataCS"
passes additional
arguments to print.table
.
arguments used to subset the events
from
an "epidataCS"
object like in subset.data.frame
.
logical indicating if the data frame of event marks
returned by marks(x)
should have the event
coordinates appended as last columns. This defaults to TRUE
.
an object of class "epidataCS"
.
the distances of all events to their potential source
events can be computed in either the "space"
or "time"
dimension.
The function as.epidataCS
is used to generate objects of class
"epidataCS"
, which is the data structure required for
twinstim
models.
The [
-method for class "epidataCS"
ensures that the subsetted object will be valid, for instance, it
updates the auxiliary list of potential transmission paths stored
in the object. The [
-method is used in
subset.epidataCS
, which is implemented similar to
subset.data.frame
.
The print
method for "epidataCS"
prints some metadata
of the epidemic, e.g., the observation period, the dimensions of the
spatio-temporal grid, the types of events, and the total number of
events. By default, it also prints the first n = 6
rows of the
events
.
An object of class "epidataCS"
is a list containing the
following components:
a "SpatialPointsDataFrame"
(see the
description of the argument).
The input events
are checked for requirements and sorted
chronologically. The columns are in the following
order: obligatory event columns, event marks, the columns BLOCK
,
start
and endemic covariates copied from stgrid
,
and finally, hidden auxiliary columns.
The added auxiliary columns are:
.obsInfLength
observed length of the infectious period
(possibly truncated at T
), i.e., pmin(T-time, eps.t)
.
.sources
a list of numeric vectors of potential sources of infection (wrt the interaction ranges eps.s and eps.t) for each event. Row numbers are used as index.
.bdist
minimal distance of the event locations to the
polygonal boundary W
.
.influenceRegion
a list of influence regions represented by
objects of the spatstat.geom class "owin"
. For each
event, this is the intersection of W
with a (polygonal)
circle of radius eps.s
centered at the event's location,
shifted such that the event location becomes the origin.
The list has nCircle2Poly
set as an attribute.
a data.frame
(see description of the argument).
The spatio-temporal grid of endemic covariates is sorted by time
interval (indexed by the added variable BLOCK
) and region
(tile
). It is a full BLOCK
x tile
grid.
a "SpatialPolygons"
object representing
the observation region.
see the above description of the argument. The
storage.mode
of the indicator matrix is set to logical
and the dimnames
are set to the levels of the event types.
Since the observation region W
defines the integration domain
in the point process likelihood,
the more detailed the polygons of W
are the longer it will
take to fit a twinstim
. You are advised to
sacrifice some shape details for speed by reducing the polygon
complexity, for example via the mapshaper
JavaScript library
wrapped by the R package rmapshaper.
Alternative tools are provided by the packages maptools
(thinnedSpatialPoly
) and spatstat.geom
(simplify.owin
).
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi: 10.1111/j.1541-0420.2011.01684.x
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi: 10.18637/jss.v077.i11
Sebastian Meyer
Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.
vignette("twinstim")
.
plot.epidataCS
for plotting, and
animate.epidataCS
for the animation of such an epidemic.
There is also an update
method for the
"epidataCS"
class.
To re-extract the events
point pattern from "epidataCS"
,
use as(object, "SpatialPointsDataFrame")
.
It is possible to convert an "epidataCS"
point pattern to
an "epidata"
object (as.epidata.epidataCS
),
or to aggregate the events into an "sts"
object
(epidataCS2sts
).
## load "imdepi" example data (which is an object of class "epidataCS")
data("imdepi")
## print and summary
print(imdepi, n=5, digits=2)
print(s <- summary(imdepi))
plot(s$counter, # same as 'as.stepfun(imdepi)'
xlab = "Time [days]", ylab="Number of infectious individuals",
main=paste("Time course of the number of infectious individuals",
"assuming an infectious period of 30 days", sep="\n"))
plot(table(s$nSources), xlab="Number of \"close\" infective individuals",
ylab="Number of events",
main=paste("Distribution of the number of potential sources",
"assuming an interaction range of 200 km and 30 days",
sep="\n"))
## the summary object contains further information
str(s)
## a histogram of the spatial distances to potential source events
## (i.e., to events of the previous eps.t=30 days within eps.s=200 km)
sourceDists_space <- getSourceDists(imdepi, "space")
hist(sourceDists_space); rug(sourceDists_space)
## internal structure of an "epidataCS"-object
str(imdepi, max.level=4)
## see help("imdepi") for more info on the data set
## extraction methods subset the 'events' component
## (thereby taking care of the validity of the epidataCS object,
## for instance the hidden auxiliary column .sources)
imdepi[101:200,]
tail(imdepi, n=4) # reduce the epidemic to the last 4 events
subset(imdepi, type=="B") # only consider event type B
## see help("plot.epidataCS") for convenient plot-methods for "epidataCS"
###
### reconstruct the "imdepi" object
###
## observation region
load(system.file("shapes", "districtsD.RData", package="surveillance"),
verbose = TRUE)
summary(stateD)
## extract point pattern of events from the "imdepi" data
data(imdepi)
events <- marks(imdepi) # data frame with coordinate columns
coordinates(events) <- c("x", "y") # promote to a "SpatialPointsDataFrame"
#proj4string(events) <- proj4string(stateD)
events@proj4string <- stateD@proj4string # exact copy (avoid CRS reformatting)
## or, much simpler, use the corresponding coerce-method
# \dontshow{
events@coords.nrs <- numeric(0L)
stopifnot(all.equal(as(imdepi, "SpatialPointsDataFrame"), events))
# }
events <- as(imdepi, "SpatialPointsDataFrame")
summary(events)
## plot observation region with events
plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]")
points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type))
legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2)
## space-time grid with endemic covariates
head(stgrid <- imdepi$stgrid[,-1])
## reconstruct the "imdepi" object from its components
myimdepi <- as.epidataCS(events = events, stgrid = stgrid,
W = stateD, qmatrix = diag(2), nCircle2Poly = 16)
if (FALSE) {
## This reconstructed object is equal to 'imdepi' as long as the internal
## structures of the embedded classes ("owin", "SpatialPolygons", ...), and
## the calculation of the influence regions by "polyclip" have not changed:
stopifnot(all.equal(imdepi, myimdepi))
}