I want to model counts of an event in a pre-post design. A sample dataset could look like this:
dat <- tibble::tibble(
day = 1:20,
event = c(0,0,2,5,0,10,3,0,0,0,1,3,4,0,5,0,0,2,0,10),
group = c(rep("pre", 10), rep("post", 10))
)
In my real data there are definitely too many zeros for a Poisson process. Thus I am leaning towards fitting three models (Poisson, negative binomial, zero-inflated poisson, zero-inflated negative binomial), performing model comparison, and then performing inference on the best model. However, I am not sure if my approach is valid.
This is what I would like to do:
# fit poisson model
m1 <- glm(event ~ group, family = "poisson", data = dat)
get AIC for m1
m1.aic <- AIC(m1)
fit neg binom model
m2 <- MASS::glm.nb(event ~ group, data = dat)
get AIC for m2
m2.aic <- AIC(m2)
fit zero-inflated neg binom model
m3 <- pscl::zeroinfl(event ~ group | group, dist = "negbin", data = dat)
get AIC for m3
m3.aic <- AIC(m3)
fit zero-inflated poisson model
m4 <- pscl::zeroinfl(event ~ group | group, dist = "poisson", data = dat)
get AIC for m4
m4.aic <- AIC(m4)
summary of lowest AIC model
summary(m2)
conclusion: no significant effect of group on number of events
check predicted mean number of events per group
emmeans::emmeans(m2, ~ group, type = "response")
Do you see any problems with this? What would you do differently?