Compute run length for a count data or categorical CUSUM. The computations are based on a Markov representation of the likelihood ratio based CUSUM.

## Usage

LRCUSUM.runlength(mu,mu0,mu1,h,dfun, n, g=5,outcomeFun=NULL,...)

## Arguments

mu

$$k-1 \times T$$ matrix with true proportions, i.e. equal to mu0 or mu1 if one wants to compute e.g. $$ARL_0$$ or $$ARL_1$$.

mu0

$$k-1 \times T$$ matrix with in-control proportions

mu1

$$k-1 \times T$$ matrix with out-of-control proportion

h

The threshold h which is used for the CUSUM.

dfun

The probability mass function or density used to compute the likelihood ratios of the CUSUM. In a negative binomial CUSUM this is dnbinom, in a binomial CUSUM dbinom and in a multinomial CUSUM dmultinom.

n

Vector of length $$T$$ containing the total number of experiments for each time point.

g

The number of levels to cut the state space into when performing the Markov chain approximation. Sometimes also denoted $$M$$. Note that the quality of the approximation depends very much on $$g$$. If $$T$$ greater than, say, 50 its necessary to increase the value of $$g$$.

outcomeFun

A hook function (k,n) to compute all possible outcome states to compute the likelihood ratio for. If NULL then the internal default function surveillance:::outcomeFunStandard is used. This function uses the Cartesian product of 0:n for k components.

...

Additional arguments to send to dfun.

## Details

Brook and Evans (1972) formulated an approximate approach based on Markov chains to determine the PMF of the run length of a time-constant CUSUM detector. They describe the dynamics of the CUSUM statistic by a Markov chain with a discretized state space of size $$g+2$$. This is adopted to the time varying case in Höhle (2010) and implemented in R using the ... notation such that it works for a very large class of distributions.

categoricalCUSUM

## Value

A list with five components

P

An array of $$g+2 \times g+2$$ transition matrices of the approximation Markov chain.

pmf

Probability mass function (up to length $$T$$) of the run length variable.

cdf

Cumulative density function (up to length $$T$$) of the run length variable.

arl

If the model is time homogenous (i.e. if $$T==1$$) then the ARL is computed based on the stationary distribution of the Markov chain. See the eqns in the reference for details. Note: If the model is not time homogeneous then the function returns NA and the ARL has to be approximated manually from the output. One could use sum(1:length(pmf) * pmf), which is an approximation because of using a finite support for a sum which should be from 1 to infinity.

Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures - Festschrift in Honour of Ludwig Fahrmeir, Physica-Verlag, pp. 377-397. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle2010-preprint.pdf

Höhle, M. and Mazick, A. (2010): Aberration detection in R illustrated by Danish mortality monitoring. In: T. Kass-Hout and X. Zhang (Eds.), Biosurveillance: A Health Protection Priority, CRCPress. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle_mazick2009-preprint.pdf

Brook, D. and Evans, D. A. (1972): An approach to the probability distribution of cusum run length. Biometrika 59(3):539-549.