Nested random effect, and converting from R formula syntax to math

Question

I have an experiment where I am looking at species-specific responses to a treatment application. Within each species, I have 3 genotypes, arbitrarily chosen to ensure we aren't detecting genotype-specific affects that get attributed to species. Each genotype has a number of replicates within it. Thus, specific genotypes are nested within a species, and genotype 1 from species A would not necessarily be comparable to genotype 1 from species B. My understanding of how to specify this for lme4 would be

response ~ treatment * species + (1|genotype)

or alternatively

response ~ treatment * (species|genotype)

I am trying to implement this into a bayesian hierarchical model using Turing.jl. I am struggling to come up with the mathematical notation for this model, and am specifically having trouble conceptualizing whether genotype needs to be nested within species or whether species can stand alone. It seems that each genotype represents a block or group, with the species effect being applied to the entire block, while the treatment effect is applied to each individual within the species block.

I think the model would be written like this:

$$ y \sim Normal(\hat{y}, \sigma) $$ where $$ \hat{y} \sim \alpha_{[genotype]} + \alpha_{[species]} + \beta_{[species]} + \beta_{[treatment]} + \gamma_{[species][treatment]} + \epsilon $$

along with the priors for each of these parameters. Does this seem correct? Am I making a mistake in how the model formula is specified? I am unsure if I can have distinct $\alpha$ parameters for both species and genotype, or if I need to treat one as a pior for the other eg. $$ \alpha_{genotype} \sim Normal(\alpha_{species},\sigma) $$ I can convince myself of this since each genotype is representative of the underlying species-specific distribution for the response variable in question.

This is a good start! Regarding genotype and the correct lme4 formula, you would want to have separate random intercepts for each genotype and species: lmer(response ~ treatment + (1|species) + (1|genotype), data=dat). — Erik Ruzek, Nov 28 '22 at 20:48
I am curious to understand the actual effects that species has on the response, rather than just account for the introduced variation, so that makes me think that it would be a fixed effect rather than a random effect. Does that make sense? — Isaac, Nov 29 '22 at 17:46
When you say actual effects, are you talking about variation in the treatment effect by species? — Erik Ruzek, Nov 29 '22 at 20:05
Yes, I am measuring leaf chemistry, and am expecting latent variation among species, as well as an interaction between species and treatment showing that species have different responses to the treatments. — Isaac, Nov 30 '22 at 20:01

score 2 · Accepted Answer · answered Nov 30 '22 at 23:23

It seems that a key question that you are wrestling in this analysis is how to treat species. In mixed or multilevel models, a factor such as species can be treated as a random effect or a fixed effect. Treating species as random means that the species in your study are viewed as representative of a larger population of similar species. Random effects are nice for lots of other reasons that are worth considering, including that they employ partial pooling of predicted means for individual species and provide standard errors/deviations of the predictions. Given your interest in whether species have different responses to the treatments, the associated lme4 model would be as follows:

lmer(response ~ treatment + (treatment|species) + (1|genotype), data=dat)

This gives you the latent variation among species (the random intercept for species) as well as the differential effect of treatment by species (the random treatment slope that allows the treatment effect to vary by species). Note that you could also allow the treatment effect to vary by genotype by altering the genotype random effect to (treatment|genotype). If, as you expect, all variation is at the species level, then random treatment slope should show little variation by genotype (small standard deviation or variance).

If instead, you treated species as a fixed effect, you are only interested in the species you have collected data on and not generalizing beyond those species. The corresponding lme4 syntax for this model is:

lmer(response ~ treatment*species + (1|genotype), data=dat)

Another reason researchers might treat a factor as fixed is because they have very few levels of that factor (say less than 10) and random effects for such a small number of levels can be problematic.

Nested random effect, and converting from R formula syntax to math

1 Answers1