Continuous-Time SIR Event History of a Fixed Population
epidata.Rd
The function as.epidata
is used to generate objects
of class "epidata"
. Objects of this class are
specific data frames containing the event history of an epidemic together
with some additional attributes. These objects are the basis for fitting
spatio-temporal epidemic intensity models with the function
twinSIR
. Their implementation is illustrated in Meyer
et al. (2017, Section 4), see vignette("twinSIR")
.
Note that the spatial information itself, i.e.
the positions of the individuals, is assumed to be constant over time.
Besides epidemics following the SIR compartmental model, also data from SI,
SIRS and SIS epidemics may be supplied.
Usage
as.epidata(data, ...)
# S3 method for class 'data.frame'
as.epidata(data, t0,
tE.col, tI.col, tR.col, id.col, coords.cols,
f = list(), w = list(), D = dist,
max.time = NULL, keep.cols = TRUE, ...)
# Default S3 method
as.epidata(data, id.col, start.col, stop.col,
atRiskY.col, event.col, Revent.col, coords.cols,
f = list(), w = list(), D = dist, .latent = FALSE, ...)
# S3 method for class 'epidata'
print(x, ...)
# S3 method for class 'epidata'
x[i, j, drop]
# S3 method for class 'epidata'
update(object, f = list(), w = list(), D = dist, ...)
Arguments
- data
For the
data.frame
-method, a data frame with as many rows as there are individuals in the population and time columns indicating when each individual became exposed (optional), infectious (mandatory, but can beNA
for non-affected individuals) and removed (optional). Note that this data format does not allow for re-infection (SIRS) and time-varying covariates. Thedata.frame
-method converts the individual-indexed data frame to the long event history start/stop format and then feeds it into the default method. If calling the generic functionas.epidata
on adata.frame
and thet0
argument is missing, the default method is called directly.
For the default method,data
can be amatrix
or adata.frame
. It must contain the observed event history in a form similar toSurv(, type="counting")
in package survival, with additional information (variables) along the process. Rows will be sorted automatically during conversion. The observation period is split up into consecutive intervals of constant state - thus constant infection intensities. The data frame consists of a block of \(N\) (number of individuals) rows for each of those time intervals (all rows in a block have the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the individual during the interval given by the start and stop columnsstart.col
andstop.col
.
Note that there may not be more than one event (infection or removal) in a single block. Thus, in a single block, only one entry in theevent.col
andRevent.col
may be 1, all others are 0. This rule follows the point process characteristic that there are no concurrent events (infections or removals).- t0,max.time
observation period. In the resulting
"epidata"
, the time scale will be relative to the start timet0
. Individuals that have already been removed prior tot0
, i.e., rows withtR <= t0
, will be dropped. The end of the observation period (max.time
) will by default (NULL
, or ifNA
) coincide with the last observed event.- tE.col, tI.col, tR.col
single numeric or character indexes of the time columns in
data
, which specify when the individuals became exposed, infectious and removed, respectively.tE.col
andtR.col
can be missing, corresponding to SIR, SEI, or SI data.NA
entries mean that the respective event has not (yet) occurred. Note thatis.na(tE)
impliesis.na(tI)
andis.na(tR)
, andis.na(tI)
impliesis.na(tR)
(and this is checked for the provided data).
CAVE: Support for latent periods (tE.col
) is experimental!twinSIR
cannot handle them anyway.- id.col
single numeric or character index of the
id
column indata
. Theid
column identifies the individuals in the data frame. It is converted to a factor by callingfactor
, i.e., unused levels are dropped if it already was a factor.- start.col
single index of the
start
column indata
. Can be numeric (by column number) or character (by column name). Thestart
column contains the (numeric) time points of the beginnings of the consecutive time intervals of the event history. The minimum value in this column, i.e. the start of the observation period should be 0.- stop.col
single index of the
stop
column indata
. Can be numeric (by column number) or character (by column name). Thestop
column contains the (numeric) time points of the ends of the consecutive time intervals of the event history. The stop value must always be greater than the start value of a row.- atRiskY.col
single index of the
atRiskY
column indata
. Can be numeric (by column number) or character (by column name). TheatRiskY
column indicates if the individual was “at-risk” of becoming infected during the time interval (start; stop]. This variable must be logical or in 0/1-coding. Individuals withatRiskY == 0
in the first time interval (normally the rows withstart == 0
) are taken as initially infectious.- event.col
single index of the
event
column indata
. Can be numeric (by column number) or character (by column name). Theevent
column indicates if the individual became infected at thestop
time of the interval. This variable must be logical or in 0/1-coding.- Revent.col
single index of the
Revent
column indata
. Can be numeric (by column number) or character (by column name). TheRevent
column indicates if the individual was recovered at thestop
time of the interval. This variable must be logical or in 0/1-coding.- coords.cols
indexes of the
coords
columns indata
. Can be numeric (by column number), character (by column name), orNULL
(no coordinates, e.g., ifD
is a pre-specified distance matrix). These columns contain the individuals' coordinates, which determine the distance matrix for the distance-based components of the force of infection (see argumentf
). By default, Euclidean distance is used (see argumentD
).
Note that the functions related totwinSIR
currently assume fixed positions of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only (normally the rows withstart == 0
).
Theanimate
-method requires coordinates.- f
a named list of vectorized functions for a distance-based force of infection. The functions must interact elementwise on a (distance) matrix
D
so thatf[[m]](D)
results in a matrix. A simple example isfunction(u) {u <= 1}
, which indicates if the Euclidean distance between the individuals is smaller than or equal to 1. The names of the functions determine the names of the epidemic variables in the resulting data frame. So, the names should not coincide with names of other covariates. The distance-based weights are computed as follows: Let \(I(t)\) denote the set of infectious individuals just before time \(t\). Then, for individual \(i\) at time \(t\), the \(m\)'th covariate has the value \(\sum_{j \in I(t)} f_m(d_{ij})\), where \(d_{ij}\) denotes entries of the distance matrix (by default this is the Euclidean distance \(||s_i - s_j||\) between the individuals' coordinates, but see argumentD
).- w
a named list of vectorized functions for extra covariate-based weights \(w_{ij}\) in the epidemic component. Each function operates on a single time-constant covariate in
data
, which is determined by the name of the first argument: The two function arguments should be namedvarname.i
andvarname.j
, wherevarname
is one ofnames(data)
. Similar to the components inf
,length(w)
epidemic covariates will be generated in the resulting"epidata"
named according tonames(w)
. So, the names should not coincide with names of other covariates. For individual \(i\) at time \(t\), the \(m\)'th such covariate has the value \(\sum_{j \in I(t)} w_m(z^{(m)}_i, z^{(m)}_j)\), where \(z^{(m)}\) denotes the variable indata
associated withw[[m]]
.- D
either a function to calculate the distances between the individuals with locations taken from
coord.cols
(the default is Euclidean distance via the functiondist
) and the result converted to a matrix viaas.matrix
, or a pre-computed distance matrix withdimnames
containing the individual ids (a classed"Matrix"
is supported).- keep.cols
logical indicating if all columns in
data
should be retained (and not only the obligatory"epidata"
columns), in particular any additional columns with time-constant individual-specific covariates. Alternatively,keep.cols
can be a numeric or character vector indexing columns ofdata
to keep.- .latent
(internal) logical indicating whether to allow for latent periods (EXPERIMENTAL). Otherwise (default), the function verifies that an event (i.e., switching to the I state) only happens when the respective individual is at risk (i.e., in the S state).
- x,object
an object of class
"epidata"
.- ...
arguments passed to
print.data.frame
. Currently unused in theas.epidata
-methods.- i,j,drop
arguments passed to
[.data.frame
.
Details
The print
method for objects of class "epidata"
simply prints
the data frame with a small header containing the time range of the observed
epidemic and the number of infected individuals. Usually, the data frames
are quite long, so the summary method summary.epidata
might be
useful. Also, indexing/subsetting "epidata"
works exactly as for
data.frame
s, but there is an own method, which
assures consistency of the resulting "epidata"
or drops this class, if
necessary.
The update
-method can be used to add or replace distance-based
(f
) or covariate-based (w
) epidemic variables in an
existing "epidata"
object.
SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.
Note
The column name "BLOCK"
is a reserved name. This column will be
added automatically at conversion and the resulting data frame will be
sorted by this column and by id. Also the names "id"
, "start"
,
"stop"
, "atRiskY"
, "event"
and "Revent"
are
reserved for the respective columns only.
Value
a data.frame
with the columns "BLOCK"
, "id"
,
"start"
, "stop"
, "atRiskY"
, "event"
,
"Revent"
and the coordinate columns (with the original names from
data
), which are all obligatory. These columns are followed by any
remaining columns of the input data
. Last but not least, the newly
generated columns with epidemic variables corresponding to the functions
in the list f
are appended, if length(f)
> 0.
The data.frame
is given the additional attributes
- "eventTimes"
numeric vector of infection time points (sorted chronologically).
- "timeRange"
numeric vector of length 2:
c(min(start), max(stop))
.- "coords.cols"
numeric vector containing the column indices of the coordinate columns in the resulting data frame.
- "f"
this equals the argument
f
.- "w"
this equals the argument
w
.
See also
The hagelloch
data as an example.
The plot
and the
summary
method for class "epidata"
.
Furthermore, the function animate.epidata
for the animation of
epidemics.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
Function simEpidata
for the simulation of epidemic data.
References
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
Examples
data("hagelloch") # see help("hagelloch") for a description
head(hagelloch.df)
## convert the original data frame to an "epidata" event history
myEpi <- as.epidata(hagelloch.df, t0 = 0,
tI.col = "tI", tR.col = "tR", id.col = "PN",
coords.cols = c("x.loc", "y.loc"),
keep.cols = c("SEX", "AGE", "CL"))
str(myEpi)
head(as.data.frame(myEpi)) # "epidata" has event history format
summary(myEpi) # see 'summary.epidata'
plot(myEpi) # see 'plot.epidata' and also 'animate.epidata'
## add distance- and covariate-based weights for the force of infection
## in a twinSIR model, see vignette("twinSIR") for a description
myEpi <- update(myEpi,
f = list(
household = function(u) u == 0,
nothousehold = function(u) u > 0
),
w = list(
c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i,
c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i
)
)
## this is now identical to the prepared hagelloch "epidata"
stopifnot(all.equal(myEpi, hagelloch))