# Bayesian modelling of left-censored data using JAGS

Bayesian modelling of left-censored data using JAGS

Martyn Plummer's JAGS very helpfully provides us with a way to model censored data through the use of the dinterval distribution. However, almost all of the examples that one finds on the web are for right censored data. The changes to model left censored data are not major, but I do think they warrant a) a post/page of their own and b) hopefully an easy-to-understand example.

Left-censored data arises very commonly when dealing with detection limits from instrumentation. In my own work, I often end up involved in the modelling of data derived from electropherograms.

I will start by generating some left censored data. For simplicity I am going to assume that my data is exponentially distributed, with a true rate of 1.05 ($$\lambda = 1.05$$), and a detection/censoring threshold at log(29). This means that approximately 97.1% of my data (on average) will not exceed my detection threshold. This may seem extreme, but it is the kind of setup that is common in my own work.

set.seed(35202)
x = rexp(10000, rate = 1.05)

## set all the censored values to NA's
x[x < log(29)] = NA


This gives us a data set of size 10,000 with 9,691 values that fall below the detection threshold.

It can be a useful check, if feasible, to see what the maximum likelihood estimate is. We can do this in R using the optim function

## define the log-likelihood
logLik = function(lambda){
isCensored = is.na(x)
nCensored = sum(isCensored)
LOD = log(29)

ll = sum(dexp(x[!isCensored], rate = lambda,
log = TRUE)) +
nCensored  * pexp(LOD, rate = lambda,
log = TRUE)

## return the -ve value because optim
## is a minimizer
return(-ll)
}

fit = optim(0.5, logLik, method = "Brent",
lower = 0, upper = 10,
hessian = TRUE)
fisherInfo = 1/fit$hessian sigma = sqrt(fisherInfo) upper = fit$par + 1.96 * sigma
lower = fit$par - 1.96 * sigma interval = c(lower, fit$par, upper)
names(interval) = c("lower", "MLE", "upper")
interval

## lower   MLE upper
## 1.001 1.032 1.064


So the 95% confidence interval contains our true value which is always a good start!

The trick, if there is any, to dealing with left-censored data in JAGS is to make sure that your indicator variable tells JAGS which variables are above the detection threshold.

So in the next step I will set up the list that contains my data.

bugsData = list(N = length(x),
isAboveLOD = ifelse(!is.na(x),
1, 0),
x = x,
LOD = rep(log(29), length(x)))


There are two points to make about the preceding code. Firstly the variable isAboveLOD uses the NA status of the data. If you have not recoded your censored values to NA then obviously this will not work. Secondly, there is a limit of detection vector LODregardless of whether the observation is below the limit of detection or not.

Next we need to set up some intial values. The key here is setting not only an initial value for $$\lambda$$, but initial values for the observations that have been censored.

bugsInits = list(list(lambda = 0.5,
x = rep(NA, length(x))))

## set the missing values to
## random variates from
## U(0, log(29))
nMissing = sum(!bugsData$isAboveLOD) bugsInits[[1]]$x[!bugsData$isAboveLOD] = runif(nMissing, 0, log(29))  I have chosen uniform random variates between zero and the limit of detection (log(29)) as initial values for my censored data. Now we need to set up our JAGS model. I will store this in a string, and then use writeLines to write this to disk. modelString = " model{ for(i in 1:N){ isAboveLOD[i] ~ dinterval(x[i], LOD[i]) x[i] ~ dexp(lambda) } lambda ~ dgamma(0.001, 0.001) } " writeLines(modelString, con = "model.bugs.R")  Note that I have used a vague gamma prior for $$\lambda$$. This is not especially noteworthy, except from the point of view about being explicit about what I have done. dinterval returns 0 if x[i] < LOD[i] and a 1 otherwise. Many people who try to use dinterval often get this cryptic error Observed node inconsistent with unobserved parents at initialization. This can happen for two reasons. Firstly the indicator variable can be incorrectly set, that is the observations above the limit of detection have been coded with a 0 instead of a 1 and vice versa for the censored observations. Secondly, the error can occur because the initial values for the censored observations are outside of the censored interval. We now have the three components necessary to fit our model: a list containing the data, a list of initial values, and a BUGS model. Firstly we initialize the model library(rjags)  ## Loading required package: coda ## Loading required package: lattice ## Linked to JAGS 3.3.0 ## Loaded modules: basemod,bugs  jagsModel = jags.model(file = "model.bugs.R", data = bugsData, inits = bugsInits)  ## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph Size: 30003 ## ## Initializing model  Next, we let the model burn-in for an arbitrary period. I will use a burn-in period of 1,000 iterations. update(jagsModel, n.iter = 1000)  And finally, we take a sample (hopefully) from the posterior distribution of the parameter of interest, $$\lambda$$ parameters = c("lambda") simSamples = coda.samples(jagsModel, variable.names = parameters, n.iter = 10000) stats = summary(simSamples)  We can obtain a 95% confidence interval by using the posterior mean and standard error, i.e. mx = stats$statistics[1]
sx = stats$statistics[2] ci = mx + c(-1, 1) * 1.96 * sx ci  ## [1] 1.002 1.065  or we can get a 95% credible interval by using the posterior quantiles, i.e. ci = stats$quantiles
ci

##  2.5% 97.5%
## 1.002 1.065


Both intervals contain the true value of 1.05 and have a fairly reasonable estimate of it as well. They also are very close to the ML interval.

I hope this clears things up for someone.

### Credit where credit is due

I could not have written this without following the right censored example provided by John Kruschke here, and from reading Martyn Plummer's presentation here.

## 4 thoughts on “Bayesian modelling of left-censored data using JAGS”

1. Paul Johnson says:

Thanks, that’s really helpful, especially as the John Kruschke and Martyn Plummer links seem to be broken now.

2. Justin says:

Thank you for this post. Unfortunately, I still do not fully understand how your code accounts for the left censoring.
In my understanding,
For the data,
isAboveLOD = ifelse(!is.na(x),1, 0)
gives a value of 0 below the limit and 1 above the limit.
And
isAboveLOD[i] ~ dinterval(x[i], LOD[i])
also gives a value of 0 below the limit and 1 above the limit.

So doesn’t this still describing right censoring?

The isAboveLOD is used in conjunction with x, specifically, when x = NA which denotes that it is unobserved. In this case, the JAGS user manual says (rewritten using the symbols in this example): When x is unobserved, the likelihood from the dinterval distribution imposes the a posteriori restriction that x must lie in the interval 0 < x < LOD. Censored data occur when outcomes are not observed directly but are known only to be above a certain value (right-censoring), or below a certain value (left-censoring), or in between two values (interval-censoring). The dinterval distribution can be used to represent all three forms of censoring.