1

This might seem a stupid question but I can't find an clear answer.

Say I am studying Species richness (outcome variable) in lakes. I have data from 20 lakes, over 4 years, and 3 values per year(early, mid, and late summer).
It is clear to me that LakeID should be a random factor, whatever the model I consider, to account for pseudoreplication as samples from the same lake are likely correlated.

Now, I would like to see, amongst others, if the area of the lake (its size, termed Lake_area) influences Species richness. I wanted to put it as a fixed effect.

The model would be:

lme(Richness ~ Lake_area + 1|LakeID)

Is this correct to do so?

  • On the one hand, I feel this is redundant, as one lake has only one size (one area, it does not change over the study period). And by including LakeID as a random effect, I somehow take into account variation on richness due to variation between lakes.
  • On the other hand, (i)doing a model comprising only Lake_area as a fixed effect, no random factor would be surely wrong, as it would fail to consider that some samples are not independent. (ii) while Lake_area is somehow "nested" in Lake, Lake_area does not explain all the variation which is due to LakeID. Depth, morphometry, catchment, light exposition, etc are all potentially playing a role, that I do not necessarily capture with Lake_area. (iii) if I want to do a more complex model (which I do, I will have time in julian_days as a fixed effect and likely other variables), I have to include LakeID as a random factor in my final model.

One solution I see to test the effect of lake_area is to take the average richness per lake as the outcome variable, and fit this model

 lm(Mean(Richness_per_lake)~Lake_area)

But I lose power and I can't do that if I want to include other variables.

Any help would be much appreciated.

  • OK, while reading Ben Bolker's GLMM FAQ, I cam across a section referring to a [blog post] (https://www.muscardinus.be/2017/08/fixed-and-random/) from Thierry Onkelinx that answers my question. It says that you can include a variable both as a fixed and random factor if and only if it is discrete. In my case, as Lake_area is discrete, the model Richness ~ Lake_area + (1|Lake_area) would be valid. I think then that Richness ~ Lake_area + (1|LakeID) would work too, as to one lakeID there is a single value of Lake_area nad vice_versa. (ctd..) – Hugo Sentenac Feb 25 '22 at 10:23
  • (..Ctd) However, while the estimates and p-values are similar with both models (lme function of nlme package), they are not exactly the same.

    Don't know what I'm missing here

    – Hugo Sentenac Feb 25 '22 at 10:25
  • PS: taking the mean per group and then doing regression is not recommended. See https://stats.stackexchange.com/questions/78451/non-linear-regression-on-graphs-with-multiple-y-per-x-values – Hugo Sentenac Feb 25 '22 at 10:28
  • I found a similar question and based on the answers it is OK and even recommended to use the group-level variable such as lake size as a fixed effect https://stats.stackexchange.com/questions/284314/fixed-effect-that-is-constant-inside-each-of-the-random-groups-in-a-mixed-model – Marta Cz-C Jul 12 '22 at 14:56
  • This question is also related https://stats.stackexchange.com/questions/223439/use-of-variables-at-the-group-level-in-linear-mixed-models – Marta Cz-C Jul 13 '22 at 15:12

0 Answers0