I'm not sure of the best way to ask this but I'll try:
I'm using negative binomial models to explore habitat relationships with species abundance. I found 4 significant factors in my full model. I then wanted to investigate relationship between abundance and each factor individually- both statistically and by plotting the relationships. The general form of my model is:
Model.full = glm.nb( SpeciesCounts ~ X1 + X2 + X3 + X4, data = dataset)
First Question: When variables X1-X4 are included in my full Model, all are significant (P < .05), but when I run the sub models with each variable independently, some are no longer significant. Is there a theoretical explanation as to why all four variables would be significant together, but on their own, not significant?
Below I'm looking at the variable DistShelf in particular:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.4004609 7.9092487 2.453 0.014172 *
Slope -0.0522097 0.0140412 -3.718 0.000201 ***
RugosityM 1.0193253 0.2121319 4.805 1.55e-06 ***
RugosityH 1.0412758 0.2891510 3.601 0.000317 ***
DistShelf -0.0002885 0.0001066 -2.706 0.006817 **
Lat -0.6546487 0.2163872 -3.025 0.002483 **
And on its own:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.563e+00 1.286e-01 -35.469 <2e-16 ***
DistShelf -1.130e-04 9.794e-05 -1.154 0.249
I was expecting the opposite - that DistShelf would have a stronger relationship on its own.
A second question is whether refitting a glm for each factor is a legitimate approach to investigate the relationship with each variable on its own e.g.
Model.full = glm.nb( SpeciesCounts ~ X1 + X2 + X3 + X4, data = dataset)
#Now can I follow up with how SpeciesCounts responds to each variable:
Model.2 = glm.nb( SpeciesCounts ~ X1 , data = dataset) # 1st variable
Model.3 = glm.nb( SpeciesCounts ~ X2 , data = dataset) # 2nd variable etc....
Or should I use the coefficients calculated by Model.full to create individual plots of SpeciesCounts ~ X1, SpeciesCounts ~ X2 etc...
Thoughts?