# Fit a Two-Component Spatio-Temporal Point Process Model

`twinstim.Rd`

A `twinstim`

model as described in Meyer et al. (2012) is fitted to
marked spatio-temporal point process data. This constitutes a
regression approach for conditional intensity function modelling.
The implementation is illustrated in Meyer et al. (2017, Section 3),
see `vignette("twinstim")`

.

## Usage

```
twinstim(endemic, epidemic, siaf, tiaf, qmatrix = data$qmatrix, data,
subset, t0 = data$stgrid$start[1], T = tail(data$stgrid$stop,1),
na.action = na.fail, start = NULL, partial = FALSE,
epilink = "log", control.siaf = list(F = list(), Deriv = list()),
optim.args = list(), finetune = FALSE,
model = FALSE, cumCIF = FALSE, cumCIF.pb = interactive(),
cores = 1, verbose = TRUE)
```

## Arguments

- endemic
right-hand side formula for the exponential (Cox-like multiplicative) endemic component. May contain offsets (to be marked by the special function

`offset`

). If omitted or`~0`

there will be no endemic component in the model. A type-specific endemic intercept can be requested by including the term`(1|type)`

in the formula.- epidemic
formula representing the epidemic model for the event-specific covariates (marks) determining infectivity. Offsets are not implemented here. If omitted or

`~0`

there will be no epidemic component in the model.- siaf
spatial interaction function. Possible specifications are:

`NULL`

or missing, corresponding to`siaf.constant()`

, i.e. spatially homogeneous infectivity independent of the distance from the hosta list as returned by

`siaf`

or, more commonly, generated by a predefined interaction function such as`siaf.gaussian`

as in Meyer et al. (2012) or`siaf.powerlaw`

as in Meyer and Held (2014). The latter requires unique event locations, possibly after random tie-breaking (`untie`

) or imputation of interval-censored locations.`siaf.exponential`

is a simpler alternative.a numeric vector corresponding to the knots of a step function, i.e. the same as

`siaf.step(knots)`

If you run into “false convergence” with a non-constant

`siaf`

specification, the numerical accuracy of the cubature methods is most likely too low (see the`control.siaf`

argument).- tiaf
temporal interaction function. Possible specifications are:

`NULL`

or missing, corresponding to`tiaf.constant()`

, i.e. time-constant infectivitya list as returned by

`tiaf`

or by a predefined interaction function such as`tiaf.exponential`

a numeric vector corresponding to the knots of a step function, i.e. the same as

`tiaf.step(knots)`

- qmatrix
square indicator matrix (0/1 or

`FALSE`

/`TRUE`

) for possible transmission between the event types. The matrix will be internally converted to`logical`

. Defaults to the \(Q\) matrix specified in`data`

.- data
an object of class

`"epidataCS"`

.- subset
an optional vector evaluating to logical indicating a subset of

`data$events`

to keep. Missing values are taken as`FALSE`

. The expression is evaluated in the context of the`data$events@data`

`data.frame`

, i.e. columns of this`data.frame`

may be referenced directly by name.- t0, T
events having occurred during (-Inf;t0] are regarded as part of the prehistory \(H_0\) of the process. Only events that occurred in the interval (t0; T] are considered in the likelihood. The time point

`t0`

(`T`

) must be an element of`data$stgrid$start`

(`data$stgrid$stop`

). The default time range covers the whole spatio-temporal grid of endemic covariates.- na.action
how to deal with missing values in

`data$events`

? Do not use`na.pass`

. Missing values in the spatio-temporal grid`data$stgrid`

are not accepted.- start
a named vector of initial values for (a subset of) the parameters. The names must conform to the conventions of

`twinstim`

to be assigned to the correct model terms. For instance,`"h.(Intercept)"`

= endemic intercept,`"h.I(start/365)"`

= coefficient of a linear time trend in the endemic component,`"h.factorB"`

= coefficient of the level B of the factor variable`factor`

in the endemic predictor,`"e.(Intercept)"`

= epidemic intercept,`"e.VAR"`

= coefficient of the epidemic term`VAR`

,`"e.siaf.2"`

= second`siaf`

parameter,`"e.tiaf.1"`

= first`tiaf`

parameter. Elements which don't match any of the model parameters are ignored.Alternatively,

`start`

may also be a named list with elements`"endemic"`

or`"h"`

,`"epidemic"`

or`"e"`

,`"siaf"`

or`"e.siaf"`

, and`"tiaf"`

or`"e.tiaf"`

, each of which containing a named numeric vector with the term labels as names (i.e. without the prefix`"h."`

,`"e."`

, etc). Thus,`start=list(endemic=c("(Intercept)"=-10))`

is equivalent to`start=c("h.(Intercept)"=-10)`

.- partial
logical indicating if a partial likelihood similar to the approach by Diggle et al. (2010) should be used (default is

`FALSE`

). Note that the partial likelihood implementation is not well tested.- epilink
a character string determining the link function to be used for the

`epidemic`

linear predictor of event marks. By default, the log-link is used. The experimental alternative`epilink = "identity"`

(for use by`epitest`

) does not guarantee the force of infection to be positive. If this leads to a negative total intensity (endemic + epidemic), the point process is not well defined (the log-likelihood will be`NaN`

).- control.siaf
a list with elements

`"F"`

and`"Deriv"`

, which are lists of extra arguments passed to the functions`siaf$F`

and`siaf$Deriv`

, respectively.

These arguments control the accuracy of the cubature routines from package polyCub involved in non-constant`siaf`

specifications, e.g., the bandwidth of the midpoint rule`polyCub.midpoint`

, the number of Gaussian quadrature points for`polyCub.SV`

, or the relative tolerance of`integrate`

in`polyCub.iso`

.

For instance,`siaf.gaussian(F.adaptive = TRUE)`

uses the midpoint-cubature`polyCub.midpoint`

with an adaptive bandwidth of`eps=adapt*sd`

to numerically integrate the kernel \(f(\bold{s})\), and the default`adapt`

value (0.1) can be overwritten by setting`control.siaf$F$adapt`

. However, the default version`siaf.gaussian()`

as well as`siaf.powerlaw()`

and friends use`polyCub.iso`

and thus accept control arguments for the standard`integrate`

routine (such as`rel.tol`

) via`control.siaf$F`

and`control.siaf$Deriv`

.

This argument list is ignored in the case`siaf=siaf.constant()`

(which is the default if`siaf`

is unspecified).- optim.args
an argument list passed to

`optim`

, or`NULL`

, in which case no optimization will be performed but the necessary functions will be returned in a list (similar to what is returned if`model = TRUE`

).Initial values for the parameters may be given as list element

`par`

in the order`(endemic, epidemic, siaf, tiaf)`

. If no initial values are provided, crude estimates will be used for the endemic intercept and the Gaussian kernel, -9 for the epidemic intercept, and zeroes for the remaining parameters. Any initial values given in the`start`

argument take precedence over those in`par`

.Note that

`optim`

receives the negative log-likelihood for minimization (thus, if used,`optim.args$control$fnscale`

should be positive). The`hessian`

argument defaults to`TRUE`

, and in the`control`

list,`trace`

ing is enabled with`REPORT=1`

by default. By setting`optim.args$control$trace = 0`

, all output from the optimization routine is suppressed.For the

`partial`

likelihood, the analytic score function and the Fisher information are not implemented and the default is to use robust`method="Nelder-Mead"`

optimization.There may be an extra component

`fixed`

in the`optim.args`

list, which determines which parameters should stick to their initial values. This can be specified by a logical vector of the same length as the`par`

component, by an integer vector indexing`par`

or by a character vector following the`twinstim`

naming conventions. Furthermore, if`isTRUE(fixed)`

, then all parameters are fixed at their initial values and no optimization is performed.Importantly, the

`method`

argument in the`optim.args`

list may also be`"nlminb"`

, in which case the`nlminb`

optimizer is used. This is also the default for full likelihood inference. In this case, not only the score function but also the*expected*Fisher information can be used during optimization (as estimated by what Martinussen and Scheike (2006, p. 64) call the “optional variation process”, or see Rathbun (1996, equation (4.7))). In our experience this gives better convergence than`optim`

's methods. For`method="nlminb"`

, the following parameters of the`optim.args$control`

list may be named like for`optim`

and are renamed appropriately:`maxit`

(->`iter.max`

),`REPORT`

(->`trace`

, default: 1),`abstol`

(->`abs.tol`

), and`reltol`

(->`rel.tol`

, default:`1e-6`

). For`nlminb`

, a logical`hessian`

argument (default:`TRUE`

) indicates if the negative*expected*Fisher information matrix should be used as the Hessian during optimization (otherwise a numerical approximation is used).Similarly,

`method="nlm"`

should also work but is not recommended here.- finetune
logical indicating if a second maximisation should be performed with robust Nelder-Mead

`optim`

using the resulting parameters from the first maximisation as starting point. This argument is only considered if`partial = FALSE`

and the default is to not conduct a second maximization (in most cases this does not improve upon the MLE).- model
logical indicating if the model environment should be kept with the result, which is required for

`intensityplot`

s and`R0(..., trimmed = FALSE)`

. Specifically, if`model=TRUE`

, the return value will have the evaluation environment set as its`environment`

, and the returned`functions`

element will contain the log-likelihood function (or partial log-likelihood function, if`partial = TRUE`

), and optionally the score and the expected Fisher information functions (not for the partial likelihood, and only if`siaf`

and`tiaf`

provide the necessary derivatives).

Note that fitted objects with a model environment might consume quiet a lot of memory since they contain the`data`

.- cumCIF
logical (default:

`FALSE`

) indicating whether to calculate the fitted cumulative ground intensity at event times. This is the residual process, see`residuals.twinstim`

.- cumCIF.pb
logical indicating if a progress bar should be shown during the calculation of

`cumCIF`

. Defaults to do so in an interactive R session, and will be`FALSE`

if`cores != 1`

.- cores
number of processes to use in parallel operation. By default

`twinstim`

runs in single-CPU mode. Currently, only the multicore-type of parallel computing via forking is supported, which is not available on Windows, see`mclapply`

in package parallel. Note that for a memoised`siaf.step`

kernel,`cores=1`

is fixed internally since parallelization would slow down model fitting significantly.- verbose
logical indicating if information should be printed during execution. Defaults to

`TRUE`

.

## Details

The function performs maximum likelihood inference
for the additive-multiplicative spatio-temporal intensity model
described in Meyer et al. (2012). It uses `nlminb`

as the
default optimizer and returns an object of class `twinstim`

.
Such objects have `print`

, `plot`

and
`summary`

methods.
The output of the `summary`

can be
processed by the `toLatex`

function. Furthermore, the usual model
fit methods such as `coef`

, `vcov`

, `logLik`

,
`residuals`

, and
`update`

are implemented.
A specific add-on is the use of the functions `R0`

and
`simulate`

.

## Value

Returns an S3 object of class `"twinstim"`

, which is a list with
the following components:

- coefficients
vector containing the MLE.

- loglik
value of the log-likelihood function at the MLE with a logical attribute

`"partial"`

indicating if the partial likelihood was used.- counts
number of log-likelihood and score evaluations during optimization.

- converged
either

`TRUE`

(if the optimizer converged) or a character string containing a failure message.- fisherinfo
*expected*Fisher information evaluated at the MLE. Only non-`NULL`

for full likelihood inference (`partial = FALSE`

) and if spatial and temporal interaction functions are provided with their derivatives.- fisherinfo.observed
observed Fisher information matrix evaluated at the value of the MLE. Obtained as the negative Hessian. Only non-

`NULL`

if`optim.args$method`

is not`"nlminb"`

and if it was requested by setting`hessian=TRUE`

in`optim.args`

.- fitted
fitted values of the conditional intensity function at the events.

- fittedComponents
two-column matrix with columns

`"h"`

and`"e"`

containing the fitted values of the endemic and epidemic components, respectively.

(Note that`rowSums(fittedComponents) == fitted`

.)- tau
fitted cumulative ground intensities at the event times. Only non-

`NULL`

if`cumCIF = TRUE`

. This is the “residual process” of the model, see`residuals.twinstim`

.- R0
estimated basic reproduction number for each event. This equals the spatio-temporal integral of the epidemic intensity over the observation domain (t0;T] x W for each event.

- npars
vector describing the lengths of the 5 parameter subvectors: endemic intercept(s) \(\beta_0(\kappa)\), endemic coefficients \(\beta\), epidemic coefficients \(\gamma\), parameters of the

`siaf`

kernel, and parameters of the`tiaf`

kernel.- qmatrix
the

`qmatrix`

associated with the epidemic`data`

as supplied in the model call.- bbox
the bounding box of

`data$W`

.- timeRange
the time range used for fitting:

`c(t0,T)`

.- formula
a list containing the four main parts of the model specification:

`endemic`

,`epidemic`

,`siaf`

, and`tiaf`

.- xlevels
a record of the levels of the factors used in fitting.

- control.siaf
see the “Arguments” section above.

- optim.args
input optimizer arguments used to determine the MLE.

- functions
if

`model=TRUE`

this is a`list`

with components`ll`

,`sc`

and`fi`

, which are functions evaluating the log-likelihood, the score function and the expected Fisher information for a parameter vector \(\theta\). The`environment`

of these function is the model environment, which is thus retained in the workspace if`model=TRUE`

. Otherwise, the`functions`

component is`NULL`

.- call
the matched call.

- runtime
the

`proc.time`

-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class`"proc_time"`

, with the number of`cores`

set as additional attribute.

If `model=TRUE`

, the model evaluation environment is assigned to
this list and can thus be queried by calling `environment()`

on
the result.

## Note

`twinstim`

makes use of the memoise package if it is
available -- and that is highly recommended for non-constant
`siaf`

specifications to speed up calculations. Specifically, the
necessary numerical integrations of the spatial interaction function
will be cached such that they are only calculated once for every
state of the `siaf`

parameters during optimization.

## References

Diggle, P. J., Kaimi, I. & Abellana, R. (2010):
Partial-likelihood analysis of spatio-temporal point-process data.
*Biometrics*, **66**, 347-354.

Martinussen, T. and Scheike, T. H. (2006): Dynamic Regression Models for Survival Data. Springer.

Meyer, S. (2010):
Spatio-Temporal Infectious Disease Epidemiology based on Point Processes.
Master's Thesis, Ludwig-Maximilians-Universität
München.

Available as https://epub.ub.uni-muenchen.de/11703/

Meyer, S., Elias, J. and Höhle, M. (2012):
A space-time conditional intensity model for invasive meningococcal
disease occurrence. *Biometrics*, **68**, 607-616.
doi:10.1111/j.1541-0420.2011.01684.x

Meyer, S. and Held, L. (2014):
Power-law models for infectious disease spread.
*The Annals of Applied Statistics*, **8** (3), 1612-1639.
doi:10.1214/14-AOAS743

Meyer, S., Held, L. and Höhle, M. (2017):
Spatio-temporal analysis of epidemic phenomena using the R package
surveillance.
*Journal of Statistical Software*, **77** (11), 1-55.
doi:10.18637/jss.v077.i11

Rathbun, S. L. (1996):
Asymptotic properties of the maximum likelihood estimator for
spatio-temporal point processes.
*Journal of Statistical Planning and Inference*, **51**, 55-74.

## See also

`vignette("twinstim")`

.

There is a `simulate.twinstim`

method,
which simulates the point process based on the fitted `twinstim`

.

A discrete-space alternative is offered by the `twinSIR`

modelling framework.

## Examples

```
# Load invasive meningococcal disease data
data("imdepi")
### first, fit a simple endemic-only model
m_noepi <- twinstim(
endemic = addSeason2formula(~ offset(log(popdensity)) + I(start/365-3.5),
S=1, period=365, timevar="start"),
data = imdepi, subset = !is.na(agegrp)
)
## look at the model summary
summary(m_noepi)
## there is no evidence for a type-dependent endemic intercept (LR test)
m_noepi_type <- update(m_noepi, endemic = ~(1|type) + .)
pchisq(2*c(logLik(m_noepi_type)-logLik(m_noepi)), df=1, lower.tail=FALSE)
### add an epidemic component with just the intercept, i.e.
### assuming uniform dispersal in time and space up to a distance of
### eps.s = 200 km and eps.t = 30 days (see summary(imdepi))
m0 <- update(m_noepi, epidemic=~1, model=TRUE)
## summarize the model fit
s <- summary(m0, correlation = TRUE, symbolic.cor = TRUE)
s
# output the table of coefficients as LaTeX code
toLatex(s, digits=2)
# or, to report rate ratios
xtable(s)
## the default confint-method can be used for Wald-CI's
confint(m0, level=0.95)
## same "untrimmed" R0 for every event (simple epidemic intercept model)
summary(R0(m0, trimmed=FALSE))
## plot the path of the fitted total intensity
plot(m0, "total intensity", tgrid=500)
if (surveillance.options("allExamples")) {
## extract "residual process" integrating over space (takes some seconds)
res <- residuals(m0)
# if the model describes the true CIF well _in the temporal dimension_,
# then this residual process should behave like a stationary Poisson
# process with intensity 1
plot(res, type="l"); abline(h=c(0, length(res)), lty=2)
# easier, with CI and serial correlation -> checkResidualProcess()
checkResidualProcess(m0)
}
if (FALSE) {
## NB: in contrast to nlminb(), optim's BFGS would miss the
## likelihood maximum wrt the epidemic intercept
m0_BFGS <- update(m_noepi, epidemic=~1, optim.args = list(method="BFGS"))
format(cbind(nlminb=coef(m0), BFGS=coef(m0_BFGS)), digits=3, scientific=FALSE)
m0_BFGS$fisherinfo # singular Fisher information matrix here
m0$fisherinfo
logLik(m0_BFGS)
logLik(m0)
## nlminb is more powerful since we make use of the analytical fisherinfo
## as estimated by the model during optimization, which optim cannot
}
### an epidemic-only model?
## for a purely epidemic model, all events must have potential source events
## (otherwise the intensity at the observed event would be 0)
## let's focus on the C-type for this example
imdepiC <- subset(imdepi, type == "C")
table(summary(imdepiC)$nSources)
## 106 events have no prior, close events (in terms of eps.s and eps.t)
try(twinstim(epidemic = ~1, data = imdepiC)) # detects this problem
## let's assume spatially unbounded interaction
imdepiC_infeps <- update(imdepiC, eps.s = Inf)
(s <- summary(imdepiC_infeps))
table(s$nSources)
## for 11 events, there is no prior event within eps.t = 30 days
## (which is certainly true for the first event)
plot(s$counter, main = "Number of infectious individuals over time (eps.t = 30)")
rug(imdepiC_infeps$events$time)
rug(imdepiC_infeps$events$time[s$nSources == 0], col = 2, lwd = 3)
## An endemic component would catch such events (from unobserved sources),
## otherwise a longer infectious period would need to be assumed and
## for the first event to happen, a prehistory is required (e.g., t0 = 31).
## As an example, we fit the data only until T = 638 (all events have ancestors)
m_epi <- twinstim(epidemic = ~1, data = imdepiC_infeps, t0 = 31, T = 638)
summary(m_epi)
if (surveillance.options("allExamples")) withAutoprint({
### full model with interaction functions (time-consuming)
## estimate an exponential temporal decay of infectivity
m1_tiaf <- update(m0, tiaf=tiaf.exponential())
plot(m1_tiaf, "tiaf", scaled=FALSE)
## estimate a step function for spatial interaction
summary(sourceDists <- getSourceDists(imdepi, "space"))
(knots <- quantile(sourceDists, c(5,10,20,40)/100))
m1_fstep <- update(m0, siaf=knots)
plot(m1_fstep, "siaf", scaled=FALSE)
rug(sourceDists, ticksize=0.02)
## estimate a continuously decreasing spatial interaction function,
## here we use the kernel of an isotropic bivariate Gaussian
m1 <- update(m0, siaf = siaf.gaussian())
AIC(m_noepi, m0, m1_fstep, m1)
summary(m1) # e.siaf.1 is log(sigma), no test for H0: log(sigma) = 0
exp(confint(m1, "e.siaf.1")) # a confidence interval for sigma
plot(m1, "siaf", scaled=FALSE)
## alternative: siaf.powerlaw() with eps.s=Inf and untie()d data,
## see vignette("twinstim")
## add epidemic covariates
m2 <- update(m1, epidemic = ~ 1 + type + agegrp)
AIC(m1, m2) # further improvement
summary(m2)
## look at estimated R0 values by event type
tapply(R0(m2), imdepi$events@data[names(R0(m2)), "type"], summary)
})
```