According to A.F. Zuur et al. (2009) in 'Mixed Effects Models and Extensions in Ecology with R', p. 239ff and this post: When to use an offset in a Poisson regression?, offset() can be used in Ecology when samples vary in volume (or area, depth range etc.) and are thus a (better) alternative to using the density as a response. The coefficient in front of it is then forced to be 1.
Can I also use offset() to weight counted insect stages based on their damaging effect (in my case leaf consumption)?
Since normally counts with a higher value in the offset are weighted lower (same count in higher volume means lower density), I thought of using the reciprocal value of feeding rate in the offset (same count with higher feeding rate is somehow similar to a higher density). So I have the counts of 5 stages of an insect (Imago, L1, L2, L3, L4) on the same plant. The problem I see is, that these stages are correlated. I tried to account for the correlation by using plant as a random effect.
I simulated some data inspired by data I observed in a field experiment and ran a GLM:
library(simstudy) # package to model correlated Poisson distributions
l_t <- c(0.1, 1.5, 1.0, 1.5, 3.5)
# lambdas for Imagines, and larval stages L1-L4 for treated plants
l_ut <- c(0.2, 3.0, 2.0, 0.5, 0.2)
# for untreated plants (more L1 & L2 but less L3 & L4)
r <- matrix(c(1.00, 0.07, 0.18,0.12, 0.00,
0.07, 1.00, 0.12,0.06,-0.08,
0.18, 0.12, 1.00,0.43,-0.04,
0.12, 0.06, 0.43,1.00, 0.16,
0.00,-0.08,-0.04,0.16, 1.00),
ncol = 5) # correlation matrix for Imagines and larvae
set.seed(654321)
dx_t <- genCorGen(50, nvars = 5, params1 = l_t,
dist = "poisson", rho = r,
corstr = "cs", wide = TRUE)
# count data of treated plants
set.seed(654321)
dx_ut <- genCorGen(50, nvars = 5, params1 = l_ut,
dist = "poisson", rho = r,
corstr = "cs", wide = TRUE)
# count data of untreated plants
x_t <- dx_t[, -1] # delete ID column
x_ut <- dx_ut[, -1]
library(tidyverse)
x <- bind_rows(x_t, x_ut)
names(x) <- c("I", "L1", "L2", "L3", "L4")
plant <- seq(1, 100) # plant IDs
treatment <- c(rep("treated",50), rep("untreated",50))
# two different treatments
df <- data.frame(x, plant, treatment)
head(df)
df2 <- df %>%
gather("stage", "count", 1:5) # each insect stage becomes a row
df_c <- data.frame (stage = c("I", "L1", "L2", "L3", "L4"),
consumption = c(1/20.8, 1/1, 1/2.1, 1/8.4, 1/17.1))
# Consumption rates of the individual stages
df3 <- left_join(df2, df_c, by = "stage")
Assignment of consumption to the stages
head(df3)
library(glmmTMB)
offset_model <- glmmTMB(count ~ treatment + (1 | plant) +
offset(log(consumption)),
family = "poisson",
data = df3)
model with plant as random factor and consumption in offset
summary(offset_model)
library(DHARMa)
simulateResiduals(offset_model, plot = TRUE)
Unfortunately, the DHARMa residual plots look terrible. So I have the feeling that this is might not be the way to go.
Reading this post (In a Poisson model, what is the difference between using time as a covariate or an offset?) and this one (Can Weights and Offset lead to similar results in poisson regression?) makes me think that using consumption as a covariate or specifying it as weights is not the right way to go either.
Anyone got a better idea?