0

I am new to generalised linear models and want to use them for my fourth year dissertation project in ecology. Please forgive any ignorance on my part, I have done my best to research this on my own but have reached a bit of a dead end and would like some help.

I have an explanatory variable and four predictors for the GLM and am using a Gaussian distribution. I have run several GLM's in RStudio with different variations of the predictors to obtain the best model, however the best model (with the lowest AICc score) is a reduced model and so only uses two of the predictors.

When reporting this in my dissertation, what do I say about the predictors that are not included in the model?

My current understanding is that I report the best model (with just the two predictors) and state that the other two were excluded from the model. Do I then ignore the other two predictors entirely or should I run a different statistical analysis individually on them such as a standard linear regression?

I started with multiple linear regression, but was running into problems because the coefficient estimates were huge and adjusted R-squared was low. I checked the diagnostics and the plots looked really good, so I was struggling to understand what was going on. I thought that in this scenario GLMs would be a better tool, but I've always found them intimidating.

I saw via tutorials you could fit the GLM as you would a multiple linear regression, but you specify the family distribution. When I did this with the full model, the coefficient estimates did not match with what was plotted. For example, a scatter plot indicated that there was a positive slope, but the estimate came back as 0.003 or a highly negative slope. I understand that this is often due to multicollinearilty, so I ran a VIF check for multicollinearity and centred the variables that were causing the issue. I then read that it is often because other predictors are suppressing the main predictors, so I decided to run multiple models.

I compare the models using AICc and ANOVA, which was fine and now I have a reduced model that works well but it takes away two predictors. My question is regards to what this actually means and how I go about reporting the results.

  • Welcome to Cross Validated! Are you doing some kind of stepwise regression? (It sounds like you are, even if you're not calling it stepwise regression.) Approaches like stepwise regression and best-subset selection have major issues. The gist of the points at the link is that, by doing the variable selection, the usual methods are operating under assumptions that are no longer true, and subsequent inferences are distorted, possibly severely. – Dave Jan 31 '24 at 22:18
  • Thanks for the response! Yes, I planned on used a step-wise regression. I've seen a few posts now that highlight the issues of this, but I'm not sure what else to do with my data. If i run a GLM on all of the predictors, the coefficients are strange even after centering for collinearity. Is it better to do individual non-parametric regressions on the predictors individually? – Kate Turland Feb 01 '24 at 10:52
  • To pass your thesis, I recommend doing the techniques you have learned in your classes or have found in modern literature of your field. To do a good job with the statistics, I recommend either collaborating with a statistician or spending years learning the subject so you can do it yourself. Frank Harrell's Regression Modeling Strategies textbook is a good resource on how to build complex models like this and make sense of them. The book would typically be studied during the second year of a postgraduate degree in statistics. – Dave Feb 01 '24 at 12:21

0 Answers0