Run length computation of a CUSUM detector
LRCUSUM.runlength.Rd
Compute run length for a count data or categorical CUSUM. The computations are based on a Markov representation of the likelihood ratio based CUSUM.
Arguments
- mu
matrix with true proportions, i.e. equal to mu0 or mu1 if one wants to compute e.g. or .
- mu0
matrix with in-control proportions
- mu1
matrix with out-of-control proportion
- h
The threshold h which is used for the CUSUM.
- dfun
The probability mass function or density used to compute the likelihood ratios of the CUSUM. In a negative binomial CUSUM this is
dnbinom
, in a binomial CUSUMdbinom
and in a multinomial CUSUMdmultinom
.- n
Vector of length containing the total number of experiments for each time point.
- g
The number of levels to cut the state space into when performing the Markov chain approximation. Sometimes also denoted . Note that the quality of the approximation depends very much on
g
. If is greater than, say, 50 it's necessary to increase the value ofg
.- outcomeFun
A hook
function (k,n)
to compute all possible outcome states to compute the likelihood ratio for. IfNULL
then the internal default functionsurveillance:::outcomeFunStandard
is used. This function uses the Cartesian product of0:n
fork
components.- ...
Additional arguments to send to
dfun
.
Details
Brook and Evans (1972) formulated an approximate approach based on Markov chains to determine the PMF of the run length of a time-constant CUSUM detector. They describe the dynamics of the CUSUM statistic by a Markov chain with a discretized state space of size . This is adopted to the time varying case in Höhle (2010) and implemented in R using the ... notation such that it works for a very large class of distributions.
Value
A list with five components
- P
An array of transition matrices of the approximation Markov chain.
- pmf
Probability mass function (up to length ) of the run length variable.
- cdf
Cumulative density function (up to length ) of the run length variable.
- arl
If the model is time homogeneous (i.e. if ) then the ARL is computed based on the stationary distribution of the Markov chain. See the eqns in the reference for details. Note: If the model is not time homogeneous then the function returns
NA
and the ARL has to be approximated manually from the output. One could usesum(1:length(pmf) * pmf)
, which is an approximation because of using a finite support for a sum which should be from 1 to infinity.
References
Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures - Festschrift in Honour of Ludwig Fahrmeir, Physica-Verlag, pp. 377-397. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle2010-preprint.pdf
Höhle, M. and Mazick, A. (2010): Aberration detection in R illustrated by Danish mortality monitoring. In: T. Kass-Hout and X. Zhang (Eds.), Biosurveillance: A Health Protection Priority, CRCPress. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle_mazick2009-preprint.pdf
Brook, D. and Evans, D. A. (1972): An approach to the probability distribution of cusum run length. Biometrika 59(3):539-549.
Examples
######################################################
#Run length of a time constant negative binomial CUSUM
######################################################
#In-control and out of control parameters
mu0 <- 10
alpha <- 1/2
kappa <- 2
#Density for comparison in the negative binomial distribution
dY <- function(y,mu,log=FALSE, alpha, ...) {
dnbinom(y, mu=mu, size=1/alpha, log=log)
}
#In this case "n" is the maximum value to investigate the LLR for
#It is assumed that beyond n the LLR is too unlikely to be worth
#computing.
LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=5,
dfun = dY, n=rep(100,length(mu0)), alpha=alpha)
h.grid <- seq(3,6,by=0.3)
arls <- sapply(h.grid, function(h) {
LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=h,
dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20)$arl
})
plot(h.grid, arls,type="l",xlab="threshold h",ylab=expression(ARL[0]))
######################################################
#Run length of a time varying negative binomial CUSUM
######################################################
mu0 <- matrix(5*sin(2*pi/52 * 1:150) + 10,ncol=1)
rl <- LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=2,
dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20)
plot(1:length(mu0),rl$pmf,type="l",xlab="t",ylab="PMF")
plot(1:length(mu0),rl$cdf,type="l",xlab="t",ylab="CDF")
########################################################
# Further examples contain the binomial, beta-binomial
# and multinomial CUSUMs. Hopefully, these will be added
# in the future.
########################################################
#dfun function for the multinomial distribution (Note: Only k-1 categories are specified).
dmult <- function(y, size,mu, log = FALSE) {
return(dmultinom(c(y,size-sum(y)), size = size, prob=c(mu,1-sum(mu)), log = log))
}
#Example for the time-constant multinomial distribution
#with size 100 and in-control and out-of-control parameters as below.
n <- 100
pi0 <- as.matrix(c(0.5,0.3,0.2))
pi1 <- as.matrix(c(0.38,0.46,0.16))
#ARL_0
LRCUSUM.runlength(mu=pi0[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE],
h=5,dfun=dmult, n=n, g=15)$arl
#ARL_1
LRCUSUM.runlength(mu=pi1[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE],
h=5,dfun=dmult, n=n, g=15)$arl