I am having trouble understanding the output of a GLM I am trying to run with R package lme4. Here is an example of what I would like to achieve with some mock data:
Suppose that I have 3 geographical regions (north, south and east) in which I have planted some tomato plants, apple trees and orange trees. At each location, I have planted 5 of each. At harvest time, I have measured the number of fruits given by each plant and also the height of each plant. I have repeated this process during 4 years, and I am treating each year as a different replicate of the same experiment. My data thus looks like this:
plant region year height n_fruits
1 tomato north year_2001 14.5 138
2 tomato north year_2003 11.5 107
3 tomato north year_2005 12.0 116
4 tomato north year_2007 13.0 110
5 tomato north year_2001 15.0 132
6 tomato north year_2003 11.0 114
Here is the full data frame for reproducibility:
data <- structure(list(plant = c("tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"tomato", "tomato", "tomato", "tomato", "tomato", "tomato", "tomato",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "apple", "apple", "apple",
"apple", "apple", "apple", "apple", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange", "orange", "orange", "orange", "orange", "orange", "orange",
"orange"),
region = c("north", "north", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "south", "south", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"east", "east", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "north", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "north", "south", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"south", "east", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "east", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "north", "north", "north", "north", "north", "north",
"north", "north", "north", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"south", "south", "south", "south", "south", "south", "south",
"south", "south", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "east", "east", "east", "east",
"east", "east", "east", "east", "east", "east"),
year = c("year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007", "year_2001",
"year_2003", "year_2005", "year_2007", "year_2001", "year_2003",
"year_2005", "year_2007", "year_2001", "year_2003", "year_2005",
"year_2007", "year_2001", "year_2003", "year_2005", "year_2007",
"year_2001", "year_2003", "year_2005", "year_2007"),
height = c(14.5,
11.5, 12, 13, 15, 11, 14.5, 15.5, 13, 13, 10, 11, 11, 13.5, 12,
14, 12.5, 13.5, 16.5, 12, 14, 15, 11, 13, 10.5, 11.5, 12, 8.5,
12, 14.5, 12, 12.5, 13, 12.5, 11, 14, 13.5, 14, 10.5, 13.5, 12,
14, 13, 14, 12.5, 12.5, 14, 9, 12.5, 13.5, 13.5, 12.5, 14.5,
12, 11.5, 10, 10, 11.5, 12.5, 13, 24, 30, 23, 25, 23, 26, 23,
25, 28, 20, 29, 24, 29, 24, 23, 25, 29, 29, 24, 28, 31, 24, 27,
24, 23, 27, 22, 27, 21, 23, 21, 22, 19, 26, 29, 28, 28, 25, 24,
28, 26, 26, 24, 23, 34, 26, 22, 21, 25, 30, 26, 32, 27, 24, 24,
21, 17, 27, 21, 25, 52, 66, 50, 50, 44, 54, 50, 50, 44, 44, 52,
52, 40, 38, 52, 60, 52, 52, 50, 66, 50, 54, 52, 44, 46, 54, 50,
44, 54, 42, 58, 52, 52, 46, 50, 50, 44, 50, 40, 46, 44, 46, 58,
50, 56, 58, 48, 40, 46, 54, 48, 52, 58, 58, 48, 48, 58, 52, 54,
52),
n_fruits = c(138, 107, 116, 110, 132, 114, 120, 143, 121,
129, 116, 107, 108, 121, 106, 128, 105, 137, 123, 131, 105, 103,
102, 106, 92, 109, 97, 98, 107, 110, 103, 106, 99, 100, 96, 119,
117, 110, 95, 107, 163, 142, 158, 133, 153, 128, 128, 114, 129,
132, 160, 144, 147, 124, 123, 120, 116, 131, 150, 126, 148, 91,
101, 179, 127, 110, 81, 129, 126, 82, 141, 95, 217, 153, 124,
133, 113, 159, 99, 94, 137, 55, 75, 61, 53, 93, 41, 77, 54, 45,
62, 59, 49, 97, 99, 82, 87, 61, 68, 107, 194, 203, 271, 142,
316, 282, 121, 158, 169, 197, 144, 186, 267, 207, 215, 219, 116,
244, 155, 164, 124, 130, 63, 86, 116, 137, 52, 95, 100, 39, 138,
81, 76, 61, 143, 65, 93, 66, 104, 211, 27, 60, 55, 71, 32, 71,
87, 42, 47, 40, 78, 38, 89, 49, 1, 34, 72, 37, 38, 50, 79, 174,
220, 255, 244, 202, 156, 153, 114, 188, 176, 202, 136, 277, 192,
273, 95, 146, 96, 154)),
row.names = c(NA, -180L),
class = "data.frame")
My hypothesis is that the number of fruits given by a plant will be largely determined by that plant's height. The specific relationship between height and number of fruits, however, may change across plant types and regions. Here are some plots to illustrate what I mean: if I just plot the number of fruits versus the height of the plant, I see no correlation:
But if I look at each plant type individually, I see positive correlations:
And if I additionally split by region, I see that these correlations are variable, for instance for apple trees the correlation is stronger in the south than in the north:
Also note that my hypothesis is that the year will have no effect, however sometimes it might be the case that the apparent correlations are in fact driven by yearly variation in height and number of fruits. I have been told that I should use a GLM in order find in which cases (i.e., for which plants and in which regions) the height (and not the year) is the main determinant of the number of fruits.
I have no experience with GLMs so I am a bit lost here. I have tried using the lme4 package and running:
library(lme4)
mymodel <- lmer(n_fruits ~ plant + region + (1|height) + (1|year),
data = data)
summary(mymodel)
Which, first of all, gives a warning that I am not sure how to interpret:
boundary (singular) fit: see help('isSingular')
I think this has to do with the random effects being too small, but I am not sure I am understanding this correctly. Then, the output is:
Linear mixed model fit by REML ['lmerMod']
Formula: n_fruits ~ plant + region + (1 | height) + (1 | year)
Data: data
REML criterion at convergence: 1795
Scaled residuals:
Min 1Q Median 3Q Max
-2.0645 -0.6951 -0.0776 0.5547 3.1917
Random effects:
Groups Name Variance Std.Dev.
height (Intercept) 210.6 14.51
year (Intercept) 0.0 0.00
Residual 1347.2 36.70
Number of obs: 180, groups: height, 44; year, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 180.535 7.521 24.004
plantorange -21.498 9.229 -2.329
planttomato -11.643 8.999 -1.294
regionnorth -53.715 7.161 -7.501
regionsouth -91.735 6.998 -13.109
Correlation of Fixed Effects:
(Intr) plntrn plnttm rgnnrt
plantorange -0.583
planttomato -0.586 0.486
regionnorth -0.464 -0.002 -0.022
regionsouth -0.471 0.009 -0.010 0.520
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
And I honestly have no idea how to interpret that. Ideally I was expecting to get some metric of significance for both the height and the year, under each combination of plant type and region. My hypothesis is that plant height should be at least sometimes significant, and year should not be significant in any context. But I don't know how to extract this information, and in fact I am not even sure that this is something a GLM will give?
I would appreciate any guidance, since as you can see my lack of experience with this makes me be quite lost.
Thanks in advance!



heightwithn_fruitsthen you need to haveheightas a "fixed-effect" predictor in your model, not a random effect as you have coded. Also, if you want to know specific combinations ofregionandplantthat alter the association betweenheightandn_fruits, they also should be treated as fixed effects. Things modeled as "random effects" aren't really evaluated individually; their distribution is modeled. See this page for much guidance and discussion. – EdM Nov 03 '22 at 20:41