I have a dataset where I randomly sampled housing developments, and then within these I systematically sampled every habitat patch. I now have a dataset where each observation is a patch_ID, and I have variables recorded for the patch such as project_ID (the development it came from), Area_ha (the area of the patch, in hectares), Pre_post (whether the patch is from a pre-development site plan or a post-development one). I would like to conduct a GLMM to investigate whether there is a difference between the size (Area_ha) of habitat patches before and after development (eg. "pre" or "post" in the Pre_post variable), whilst accounting for my grouping factor (project_ID). I am using a GLMM as my Area_ha data are heavily right skewed and have no zero values. Area is continuous, Pre_post is a factor with two levels ("pre", "post"), and project_ID is a factor with 25 levels (each of the 25 developments).
I successfully conducted this GLMM on a smaller dataset last month, using the code:
glmm_gamma <- glmer(Area_ha ~ Pre_post + (1 | project_ID),
family = Gamma(link = "log"),
data = size3)
However, now I have increased the size of my dataset, and the same line of code as above to run the model is giving me the error:
Warning: Model failed to converge with max|grad| = 0.0209587 (tol = 0.002, component 1)
The data is still right skewed and I have removed the most troublesome outlier.
I have tried the following options to troubleshoot but to no avail:
#To rescale and centre my stuff:
mu <- mean(size3$Area_ha, na.rm = TRUE)
sigma <- sd(size3$Area_ha, na.rm = TRUE)
size3$Area_ha_scaled <- (size3$Area_ha - mu) / sigma
#But I then realised I can't do this because gamma distributions don't allow for negative values
Trying to check singularity:
tt <- getME(glmm_gamma,"theta")
ll <- getME(glmm_gamma,"lower")
min(tt[ll==0])
#0.8909557, so no worry of singularity
Trying with more iterations
glmm_gamma <- glmer(Area_ha ~ Pre_post + (1 | project_ID),
family = Gamma(link = "log"),
data = size3,
control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1000000)))
#Still didn't work
I'm completely stumped for what to do next, and would really appreciate some help! I'm confused as to why adding more data has made my model stop working completely - I'm not very confident at statistics so hopefully this community can help!
I have put my data here - both the data and some of my code: https://github.com/sirianmckellen/stackex.git If anyone has any ideas on how I can move forward, whether this is a solvable issue or whether I need a new model type (if so, what should I consider?), etc, I'd be SO grateful! Thank you all!
