twinstim.Rd
A twinstim
model as described in Meyer et al. (2012) is fitted to
marked spatio-temporal point process data. This constitutes a
regression approach for conditional intensity function modelling.
The implementation is illustrated in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
twinstim(endemic, epidemic, siaf, tiaf, qmatrix = data$qmatrix, data,
subset, t0 = data$stgrid$start[1], T = tail(data$stgrid$stop,1),
na.action = na.fail, start = NULL, partial = FALSE,
epilink = "log", control.siaf = list(F = list(), Deriv = list()),
optim.args = list(), finetune = FALSE,
model = FALSE, cumCIF = FALSE, cumCIF.pb = interactive(),
cores = 1, verbose = TRUE)
right-hand side formula for the exponential (Cox-like
multiplicative) endemic component. May contain offsets (to be marked
by the special function offset
). If omitted or ~0
there will be no endemic component in the model. A type-specific
endemic intercept can be requested by including the term
(1|type)
in the formula.
formula representing the epidemic model for the event-specific
covariates (marks) determining infectivity. Offsets are not
implemented here. If omitted or ~0
there will be no epidemic
component in the model.
spatial interaction function. Possible specifications are:
NULL
or missing, corresponding to
siaf.constant()
, i.e. spatially homogeneous
infectivity independent of the distance from the host
a list as returned by siaf
or, more commonly,
generated by a predefined interaction function such as
siaf.gaussian
as in Meyer et al. (2012) or
siaf.powerlaw
as in Meyer and Held (2014).
The latter requires unique event locations, possibly after random
tie-breaking (untie
) or imputation of
interval-censored locations.
siaf.exponential
is a simpler alternative.
a numeric vector corresponding to the knots of a step
function, i.e. the same as siaf.step(knots)
If you run into “false convergence” with a non-constant
siaf
specification, the numerical accuracy of the cubature
methods is most likely too low (see the control.siaf
argument).
temporal interaction function. Possible specifications are:
NULL
or missing, corresponding to
tiaf.constant()
, i.e. time-constant infectivity
a list as returned by tiaf
or by a
predefined interaction function such as
tiaf.exponential
a numeric vector corresponding to the knots of a step
function, i.e. the same as tiaf.step(knots)
square indicator matrix (0/1 or FALSE
/TRUE
) for
possible transmission between the event types. The matrix will be
internally converted to logical
. Defaults to the \(Q\) matrix
specified in data
.
an object of class "epidataCS"
.
an optional vector evaluating to logical indicating a subset of
data$events
to keep. Missing values are
taken as FALSE
. The expression is evaluated in the context of the
data$events@data
data.frame
, i.e. columns of this
data.frame
may be referenced directly by name.
events having occurred during (-Inf;t0] are regarded as part of the
prehistory \(H_0\) of the process. Only events that occurred in the
interval (t0; T] are considered in the likelihood.
The time point t0
(T
) must
be an element of data$stgrid$start
(data$stgrid$stop
).
The default time range covers the whole spatio-temporal grid
of endemic covariates.
how to deal with missing values in data$events
? Do not use
na.pass
. Missing values in the spatio-temporal grid
data$stgrid
are not accepted.
a named vector of initial values for (a subset of) the parameters.
The names must conform to the conventions of twinstim
to be
assigned to the correct model terms. For instance,
"h.(Intercept)"
= endemic intercept,
"h.I(start/365)"
= coefficient of a linear time
trend in the endemic component, "h.factorB"
=
coefficient of the level B of the factor variable factor
in
the endemic predictor, "e.(Intercept)"
= epidemic intercept,
"e.VAR"
= coefficient of the epidemic term VAR
,
"e.siaf.2"
= second siaf
parameter,
"e.tiaf.1"
= first tiaf
parameter.
Elements which don't match any of the model parameters are ignored.
Alternatively, start
may also be a named list with elements
"endemic"
or "h"
, "epidemic"
or "e"
,
"siaf"
or "e.siaf"
, and "tiaf"
or "e.tiaf"
,
each of which containing a named numeric vector with the term labels
as names (i.e. without the prefix "h."
, "e."
, etc).
Thus, start=list(endemic=c("(Intercept)"=-10))
is equivalent
to start=c("h.(Intercept)"=-10)
.
logical indicating if a partial likelihood similar to the approach
by Diggle et al. (2010) should be used (default is FALSE
).
Note that the partial likelihood implementation is not well tested.
a character string determining the link function to be used for the
epidemic
linear predictor of event marks. By default, the
log-link is used. The experimental alternative
epilink = "identity"
(for use by epitest
) does
not guarantee the force of infection to be positive. If this leads
to a negative total intensity (endemic + epidemic), the point
process is not well defined (the log-likelihood will be
NaN
).
a list with elements "F"
and "Deriv"
, which are lists
of extra arguments passed to the functions siaf$F
and
siaf$Deriv
, respectively.
These arguments control the accuracy of the cubature routines from
package polyCub involved in non-constant siaf
specifications, e.g., the bandwidth of the midpoint rule
polyCub.midpoint
, the number of Gaussian
quadrature points for polyCub.SV
, or the relative
tolerance of integrate
in polyCub.iso
.
For instance, siaf.gaussian(F.adaptive = TRUE)
uses the
midpoint-cubature polyCub.midpoint
with an
adaptive bandwidth of eps=adapt*sd
to numerically integrate the
kernel \(f(\bold{s})\), and the default adapt
value (0.1)
can be overwritten by setting control.siaf$F$adapt
.
However, the default version siaf.gaussian()
as well as siaf.powerlaw()
and friends use
polyCub.iso
and thus accept control arguments for the
standard integrate
routine (such as rel.tol
)
via control.siaf$F
and control.siaf$Deriv
.
This argument list is ignored in the case
siaf=siaf.constant()
(which is the default if siaf
is
unspecified).
an argument list passed to optim
, or NULL
, in
which case no optimization will be performed
but the necessary functions will be returned in a list (similar to
what is returned if model = TRUE
).
Initial values for the parameters may be given as list element
par
in the order (endemic, epidemic, siaf, tiaf)
.
If no initial values are provided, crude estimates will be used for
the endemic intercept and the Gaussian kernel, -9 for the epidemic
intercept, and zeroes for the remaining parameters.
Any initial values given in the start
argument
take precedence over those in par
.
Note that optim
receives the negative log-likelihood for
minimization (thus, if used, optim.args$control$fnscale
should be
positive). The hessian
argument defaults to TRUE
, and
in the control
list, trace
ing is enabled with
REPORT=1
by default. By setting
optim.args$control$trace = 0
, all output from the
optimization routine is suppressed.
For the partial
likelihood, the analytic score function and
the Fisher information are not implemented and the default is to use
robust method="Nelder-Mead"
optimization.
There may be an extra component fixed
in the
optim.args
list, which determines which parameters should stick
to their initial values. This can be specified by a
logical vector of the same length as the par
component, by an
integer vector indexing par
or by a character vector following
the twinstim
naming conventions. Furthermore, if
isTRUE(fixed)
, then all parameters are fixed at their initial
values and no optimization is performed.
Importantly, the method
argument in the optim.args
list may also be "nlminb"
, in
which case the nlminb
optimizer is used. This is also the
default for full likelihood inference.
In this case, not only the score function but also the
expected Fisher information can be used during optimization (as
estimated by what Martinussen and Scheike (2006, p. 64) call the
“optional variation process”, or see Rathbun (1996, equation
(4.7))). In our experience this gives better convergence than
optim
's methods.
For method="nlminb"
, the following parameters of the
optim.args$control
list may be named like for
optim
and are renamed appropriately:
maxit
(-> iter.max
), REPORT
(-> trace
,
default: 1), abstol
(-> abs.tol
), and
reltol
(-> rel.tol
, default: 1e-6
).
For nlminb
, a logical hessian
argument (default:
TRUE
) indicates if the negative expected Fisher
information matrix should be used as the Hessian during optimization
(otherwise a numerical approximation is used).
Similarly, method="nlm"
should also work but is not
recommended here.
logical indicating if a second maximisation should be performed with
robust Nelder-Mead optim
using the resulting
parameters from the first maximisation as starting point. This
argument is only considered if partial = FALSE
and the
default is to not conduct a second maximization (in most cases this
does not improve upon the MLE).
logical indicating if the model environment should be kept with the
result, which is required for
intensityplot
s and
R0(..., trimmed = FALSE)
.
Specifically, if model=TRUE
, the return value will have the
evaluation environment set as its environment
,
and the returned functions
element will contain the
log-likelihood function (or partial log-likelihood function, if
partial = TRUE
), and optionally the score and the expected
Fisher information functions (not for the partial likelihood, and
only if siaf
and tiaf
provide the necessary
derivatives).
Note that fitted objects with a model environment might consume
quiet a lot of memory since they contain the data
.
logical (default: FALSE
) indicating whether to
calculate the fitted cumulative ground intensity at event times.
This is the residual process, see residuals.twinstim
.
logical indicating if a progress bar should be shown
during the calculation of cumCIF
. Defaults to do so in an
interactive R session, and will be FALSE
if cores != 1
.
number of processes to use in parallel operation. By default
twinstim
runs in single-CPU mode. Currently, only the
multicore-type of parallel computing via forking is
supported, which is not available on Windows, see
mclapply
in package parallel.
Note that for a memoised siaf.step
kernel,
cores=1
is fixed internally since parallelization would slow
down model fitting significantly.
logical indicating if information should be printed during
execution. Defaults to TRUE
.
The function performs maximum likelihood inference
for the additive-multiplicative spatio-temporal intensity model
described in Meyer et al. (2012). It uses nlminb
as the
default optimizer and returns an object of class twinstim
.
Such objects have print
, plot
and
summary
methods.
The output of the summary
can be
processed by the toLatex
function. Furthermore, the usual model
fit methods such as coef
, vcov
, logLik
,
residuals
, and
update
are implemented.
A specific add-on is the use of the functions R0
and
simulate
.
Returns an S3 object of class "twinstim"
, which is a list with
the following components:
vector containing the MLE.
value of the log-likelihood function at the MLE with a
logical attribute "partial"
indicating if the partial
likelihood was used.
number of log-likelihood and score evaluations during optimization.
either TRUE
(if the optimizer converged) or a
character string containing a failure message.
expected Fisher information evaluated at the
MLE. Only non-NULL
for full likelihood inference
(partial = FALSE
) and if spatial and temporal interaction
functions are provided with their derivatives.
observed Fisher information matrix
evaluated at the value of the MLE. Obtained as the negative Hessian.
Only non-NULL
if optim.args$method
is not
"nlminb"
and if it was requested by setting
hessian=TRUE
in optim.args
.
fitted values of the conditional intensity function at the events.
two-column matrix with columns "h"
and
"e"
containing the fitted values of the endemic and epidemic
components, respectively.
(Note that rowSums(fittedComponents) == fitted
.)
fitted cumulative ground intensities at the event times.
Only non-NULL
if cumCIF = TRUE
.
This is the “residual process” of the model, see
residuals.twinstim
.
estimated basic reproduction number for each event. This equals the spatio-temporal integral of the epidemic intensity over the observation domain (t0;T] x W for each event.
vector describing the lengths of the 5 parameter
subvectors: endemic intercept(s) \(\beta_0(\kappa)\), endemic
coefficients \(\beta\), epidemic coefficients \(\gamma\),
parameters of the siaf
kernel, and parameters of the
tiaf
kernel.
the qmatrix
associated with the epidemic
data
as supplied in the model call.
the bounding box of data$W
.
the time range used for fitting: c(t0,T)
.
a list containing the four main parts of the model
specification: endemic
, epidemic
, siaf
, and
tiaf
.
a record of the levels of the factors used in fitting.
see the “Arguments” section above.
input optimizer arguments used to determine the MLE.
if model=TRUE
this is a list
with
components ll
, sc
and fi
, which are functions
evaluating the log-likelihood, the score function and the expected
Fisher information for a parameter vector \(\theta\). The
environment
of these function is the model environment, which
is thus retained in the workspace if model=TRUE
. Otherwise,
the functions
component is NULL
.
the matched call.
the proc.time
-queried time taken
to fit the model, i.e., a named numeric vector of length 5 of class
"proc_time"
, with the number of cores
set as
additional attribute.
twinstim
makes use of the memoise package if it is
available -- and that is highly recommended for non-constant
siaf
specifications to speed up calculations. Specifically, the
necessary numerical integrations of the spatial interaction function
will be cached such that they are only calculated once for every
state of the siaf
parameters during optimization.
Diggle, P. J., Kaimi, I. & Abellana, R. (2010): Partial-likelihood analysis of spatio-temporal point-process data. Biometrics, 66, 347-354.
Martinussen, T. and Scheike, T. H. (2006): Dynamic Regression Models for Survival Data. Springer.
Meyer, S. (2010):
Spatio-Temporal Infectious Disease Epidemiology based on Point Processes.
Master's Thesis, Ludwig-Maximilians-Universität
München.
Available as https://epub.ub.uni-muenchen.de/11703/
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi: 10.1111/j.1541-0420.2011.01684.x
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi: 10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi: 10.18637/jss.v077.i11
Rathbun, S. L. (1996): Asymptotic properties of the maximum likelihood estimator for spatio-temporal point processes. Journal of Statistical Planning and Inference, 51, 55-74.
Sebastian Meyer
Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.
vignette("twinstim")
.
There is a simulate.twinstim
method,
which simulates the point process based on the fitted twinstim
.
A discrete-space alternative is offered by the twinSIR
modelling framework.
# Load invasive meningococcal disease data
data("imdepi")
### first, fit a simple endemic-only model
m_noepi <- twinstim(
endemic = addSeason2formula(~ offset(log(popdensity)) + I(start/365-3.5),
S=1, period=365, timevar="start"),
data = imdepi, subset = !is.na(agegrp)
)
## look at the model summary
summary(m_noepi)
## there is no evidence for a type-dependent endemic intercept (LR test)
m_noepi_type <- update(m_noepi, endemic = ~(1|type) + .)
pchisq(2*c(logLik(m_noepi_type)-logLik(m_noepi)), df=1, lower.tail=FALSE)
### add an epidemic component with just the intercept, i.e.
### assuming uniform dispersal in time and space up to a distance of
### eps.s = 200 km and eps.t = 30 days (see summary(imdepi))
m0 <- update(m_noepi, epidemic=~1, model=TRUE)
## summarize the model fit
s <- summary(m0, correlation = TRUE, symbolic.cor = TRUE)
s
# output the table of coefficients as LaTeX code
toLatex(s, digits=2)
# or, to report rate ratios
xtable(s)
## the default confint-method can be used for Wald-CI's
confint(m0, level=0.95)
## same "untrimmed" R0 for every event (simple epidemic intercept model)
summary(R0(m0, trimmed=FALSE))
## plot the path of the fitted total intensity
plot(m0, "total intensity", tgrid=500)
## extract "residual process" integrating over space (takes some seconds)
if (surveillance.options("allExamples"))
{
res <- residuals(m0)
# if the model describes the true CIF well _in the temporal dimension_,
# then this residual process should behave like a stationary Poisson
# process with intensity 1
plot(res, type="l"); abline(h=c(0, length(res)), lty=2)
# easier, with CI and serial correlation -> checkResidualProcess()
checkResidualProcess(m0)
}
if (FALSE) {
## NB: in contrast to nlminb(), optim's BFGS would miss the
## likelihood maximum wrt the epidemic intercept
m0_BFGS <- update(m_noepi, epidemic=~1, optim.args = list(method="BFGS"))
format(cbind(nlminb=coef(m0), BFGS=coef(m0_BFGS)), digits=3, scientific=FALSE)
m0_BFGS$fisherinfo # singular Fisher information matrix here
m0$fisherinfo
logLik(m0_BFGS)
logLik(m0)
## nlminb is more powerful since we make use of the analytical fisherinfo
## as estimated by the model during optimization, which optim cannot
}
### an epidemic-only model?
## for a purely epidemic model, all events must have potential source events
## (otherwise the intensity at the observed event would be 0)
## let's focus on the C-type for this example
imdepiC <- subset(imdepi, type == "C")
table(summary(imdepiC)$nSources)
## 106 events have no prior, close events (in terms of eps.s and eps.t)
try(twinstim(epidemic = ~1, data = imdepiC)) # detects this problem
## let's assume spatially unbounded interaction
imdepiC_infeps <- update(imdepiC, eps.s = Inf)
(s <- summary(imdepiC_infeps))
table(s$nSources)
## for 11 events, there is no prior event within eps.t = 30 days
## (which is certainly true for the first event)
plot(s$counter, main = "Number of infectious individuals over time (eps.t = 30)")
rug(imdepiC_infeps$events$time)
rug(imdepiC_infeps$events$time[s$nSources == 0], col = 2, lwd = 3)
## An endemic component would catch such events (from unobserved sources),
## otherwise a longer infectious period would need to be assumed and
## for the first event to happen, a prehistory is required (e.g., t0 = 31).
## As an example, we fit the data only until T = 638 (all events have ancestors)
m_epi <- twinstim(epidemic = ~1, data = imdepiC_infeps, t0 = 31, T = 638)
summary(m_epi)
### full model with interaction functions (time-consuming)
if (surveillance.options("allExamples"))
{
## estimate an exponential temporal decay of infectivity
m1_tiaf <- update(m0, tiaf=tiaf.exponential())
plot(m1_tiaf, "tiaf", scaled=FALSE)
## estimate a step function for spatial interaction
summary(sourceDists <- getSourceDists(imdepi, "space"))
(knots <- quantile(sourceDists, c(5,10,20,40)/100))
m1_fstep <- update(m0, siaf=knots)
plot(m1_fstep, "siaf", scaled=FALSE)
rug(sourceDists, ticksize=0.02)
## estimate a continuously decreasing spatial interaction function,
## here we use the kernel of an isotropic bivariate Gaussian
m1 <- update(m0, siaf = siaf.gaussian())
AIC(m_noepi, m0, m1_fstep, m1)
summary(m1) # e.siaf.1 is log(sigma), no test for H0: log(sigma) = 0
exp(confint(m1, "e.siaf.1")) # a confidence interval for sigma
plot(m1, "siaf", scaled=FALSE)
## alternative: siaf.powerlaw() with eps.s=Inf and untie()d data,
## see vignette("twinstim")
## add epidemic covariates
m2 <- update(m1, epidemic = ~ 1 + type + agegrp)
AIC(m1, m2) # further improvement
summary(m2)
## look at estimated R0 values by event type
tapply(R0(m2), imdepi$events@data[names(R0(m2)), "type"], summary)
}