This might seem a stupid question but I can't find an clear answer.
Say I am studying Species richness (outcome variable) in lakes. I have data from 20 lakes, over 4 years, and 3 values per year(early, mid, and late summer).
It is clear to me that LakeID should be a random factor, whatever the model I consider, to account for pseudoreplication as samples from the same lake are likely correlated.
Now, I would like to see, amongst others, if the area of the lake (its size, termed Lake_area) influences Species richness. I wanted to put it as a fixed effect.
The model would be:
lme(Richness ~ Lake_area + 1|LakeID)
Is this correct to do so?
- On the one hand, I feel this is redundant, as one lake has only one size (one area, it does not change over the study period). And by including LakeID as a random effect, I somehow take into account variation on richness due to variation between lakes.
- On the other hand, (i)doing a model comprising only Lake_area as a fixed effect, no random factor would be surely wrong, as it would fail to consider that some samples are not independent. (ii) while Lake_area is somehow "nested" in Lake, Lake_area does not explain all the variation which is due to LakeID. Depth, morphometry, catchment, light exposition, etc are all potentially playing a role, that I do not necessarily capture with Lake_area. (iii) if I want to do a more complex model (which I do, I will have time in julian_days as a fixed effect and likely other variables), I have to include LakeID as a random factor in my final model.
One solution I see to test the effect of lake_area is to take the average richness per lake as the outcome variable, and fit this model
lm(Mean(Richness_per_lake)~Lake_area)
But I lose power and I can't do that if I want to include other variables.
Any help would be much appreciated.
Don't know what I'm missing here
– Hugo Sentenac Feb 25 '22 at 10:25