Is it appropriate to use GLMM for the dataset I have

Question

I am interested in determining the influence of biotic (initial height) and abiotic (light, canopy, soil moisture content, soil nutrients, rainfall, and temperature) factors in the absolute height-growth of a particular plant species. The absolute height-growth(AHG, in cm), the response variable, is just the difference between the plant height in November (last census) and the plant height in September (first census). I have set up plots in three study sites-- in low (n=5), middle (n=5), and high (n=6) elevations. I was thinking of using the plots as the random effect and the rest of the predictors as fixed effects. However, I only have one value for available soil phosphorus and potassium regardless of the elevation. The dataset can be found here: Dataset

Is it appropriate to use glmm for this dataset? If not, what are my options?

score 0 · Answer 1 · answered Dec 16 '22 at 22:39

With only a single value of soil phosphorus and potassium for all cases, you can't find any relationship between them and your outcome.

You have a similar problem with Light and Canopy. Each of the 3 Elevations only has a single value for each of those predictors. So you can't separate out Elevation from Light or Canopy in the model. You can only choose 1 of them; if you try to use more than 1 the software will just choose 1 on its own. I suppose the Elevation will be the safest choice.

It's not a good idea to model a change score like AHG as an outcome with the baseline height as a predictor. See this page. The change is necessarily related to the baseline. Model the final height instead, which you can easily calculate from your data.

You currently have plot numbers 1 though 5 or 6 for each of the Elevations. If you just use the current plot numbers for your random effect labels, there's a risk that the software will assume you have the same plot repeated at each elevation. You will avoid troubles if you give each plot a unique label (like H1 through H6, L1 through L5, M1 through M5). Then you can use those unique plot labels for your random effects.

A "GLMM" is a "generalized linear mixed model." The "generalized" part means that you are modeling some function of the outcome against the predictor variables. You might be able to use a standard linear model similar to ordinary least squares, but keeping a random effect (lmer() function in R). That's just a "LMM" (linear mixed model).

You also have a wide range of Initial.height values. You have to think about how you think that final height might change as a function of initial height. Sometimes a log transform of the height values works well. That would model the mean of the logs of the final values. Or you could use a generalized model with a gaussian family and a log link to model the log of the means of the final values. With a continuous predictor it's often a good idea to model it flexibly, like with a spline, instead of assuming a strict linear association with outcome.

Then you must decide what combinations of the predictors you want to model. For example, if you think that the effect of a predictor (like Initial.height) might depend on Elevation, include an interaction between that predictor and Initial.height.

Is it appropriate to use GLMM for the dataset I have

1 Answers1