For a current piece of work I’m trying to model the probability of tree death for beech trees in a woodland in the UK. I have records of whether trees were alive or dead for 3 different census periods along with data on their diameter and growth rate. Each tree has an ID number so it can be identified at each time interval. However, the census intervals vary so that for the time between one survey and another is either 4, 12 or 18 years. Obviously the longer the census period the greater the probability a tree will have died by the time it is next surveyed. I had problems making a realistic reproducible example so you can find the data here.
The variables in the dataset are:
- ID - Unique ID for tree
- Block - the ID for the 20x20m plot in which the tree was located
- Dead - Status of tree, either dead (1) or alive (0)
- GR - Annual growth rate from previous survey
- DBH - diameter of tree at breast height
- SL - Length of time between censuses in years
Once a tree is recorded as dead it disappears from subsequent surveys.
Ideally I would like to be able to estimate the annual probability of mortality of a tree using information on diameter and growth rate. Having searched around for quite a while I have seen that logistic exposure models appear able to account for differences in census periods by using an altered version of logit link for binomial models as detailed by Ben Bolker here. This was originally used by Shaffer to determine the daily probability of bird nest survival where the age (and therefore exposure) of the nest differed. I've not seen it used outside of the context of models of nest survival but it seems like I should be able to use it to model survival/mortality where the exposure differs.
require(MASS)
logexp <- function(exposure = 1)
{
linkfun <- function(mu) qlogis(mu^(1/exposure))
## FIXME: is there some trick we can play here to allow
## evaluation in the context of the 'data' argument?
linkinv <- function(eta) plogis(eta)^exposure
logit_mu_eta <- function(eta) {
ifelse(abs(eta)>30,.Machine$double.eps,
exp(eta)/(1+exp(eta))^2)
## OR .Call(stats:::C_logit_mu_eta, eta, PACKAGE = "stats")
}
mu.eta <- function(eta) {
exposure * plogis(eta)^(exposure-1) *
logit_mu_eta(eta)
}
valideta <- function(eta) TRUE
link <- paste("logexp(", deparse(substitute(exposure)), ")",
sep="")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
}
At the moment my model looks like this, but I will incorporate more variables as I go along:
require(lme4)
Dead<-read.csv("Stack_dead.csv",)
M1<-glmer(Dead~DBH+(1|ID),data=Dead,family=binomial(logexp(Dead$SL)))
#I use (1|ID) here to account for the repeated measurements of the same individuals
summary(M1)
plot(Dead$DBH,plogis(predict(M1,re.form=NA)))
Primarily I want to know:
- Does the statistical technique I am using to control for the difference in time between census seem sensible? If it isn't, can you think of a better way to deal with the problem?
- If the answer to the first question is yes, is using the inverse logit (plogis) the correct way to get predictions expressed as probabilities?
Thanks in advance for any help!
offset(log(exposure))? – colin Jul 28 '19 at 06:22