I've created a mixed effects model using lme4 in R:
model <- lmer(GrowthRate ~ I(Year-1970) * factor(PlotType) + (1 + I(Year-1970)|Plot), data = dat)
Or a working example for SE:
model <- lmer(uptake ~ conc * factor(Type) + (1 + conc|Plant), data = CO2)
Next, I use predict to determine the trend:
pred <- predict(model,re.form = NA)
I want to calculate 95% confidence intervals around my trend line(s), but predict.merMod does not calculate standard errors. The authors suggest using bootMer:
There is no option for computing standard errors of predictions because it is difficult to define an efficient method that incorporates uncertainty in the variance parameters; we recommend
bootMerfor this task.
So I use bootMer:
boot <- bootMer(model, predict, nsim = 10000, re.form = NA)
This is where I'm stuck.
How do I extract standard errors from bootMer? (and how do I convert these to CIs?)
Attempts:
I. This source suggests to do the following for a single predicted value:
b <- bootMer(fm1, nsim=100, function(x) predict(x, newdata = data.frame(Year = 1990), re.form=NA))
(paraphrasing):
b$tlists the output of the function we applied to each of the bootstrap resamples, in this case the fitted values at the given predicted value ofx. The standard error of our predicted value can be estimated simply as the standard deviation of the sampling distribution:sd(b$t)and the confidence interval as the quantiles of the distribution (in this case, a 95% confidence interval):quantile(b$t, probs=c(0.025, 0.975))
However, does this approach "scale up" to calculate the CIs for the entire trend line(s)?
- It doesn't appear to, since
sd(boot$t)only gives a single value as it did withb$tin the above example.
II. This source uses an entirely different approach:
#Bootstrapped CIs: [modified slightly for this question]
#Create model matrix:
mm <- model.matrix(~ I(Year - 1970) * factor(PlotType),dat)
#define a function to be applied to the nsim simulations:
#(the function basically gets a merMod object and returns the fitted values)
predFun <- function(.) mm%*%fixef(.)
#supply function to bootMer:
bb <- bootMer(m, FUN = predFun, nsim = 200) #do this 200 times
#as we did this 200 times the 95% CI will be bordered by the 5th and 195th value <<--- #WHAT DOES THIS MEAN??
bb_se <- apply(bb$t,2,function(x) x[order(x)])
blo <- bb_se[1,]
bhi <- bb_se[2,]
I don't understand 2 things from this approach:
Why they created
predFun(), and whether that is more valid than simply usingpredict(re.form = NA)How their
applyapproach (and failure to multiple SEs by 1.96) actually get them 95% confidence intervals...
III. When I look at the output of bootMer (i.e., by entering my saved object boot in my console using print(boot)), I get the following output with length(pred) number of rows:
Call:
bootMer(x = model, FUN = predict, nsim = 10000, re.form = NA)
Bootstrap Statistics :
original bias std. error
t1* 8.235689 -0.239843268 1.0835271
t2* 8.154482 -0.232643708 1.0579350
t3* 8.046205 -0.223044295 1.0265019
.....
So it appears bootMer is already calculating standard errors (std.error). However, I can't for the life of me figure out how to access these values.
I assume this would be the easiest approach if anyone knows how to do so...
- Update: So I found out I can reproduce the
std.errorvalues fromprint(boot)using the following:apply(boot$t, 2, sd). However, now this calls into question whether thebootoutput is showing standard errors or simply standard deviations...
- Update: So I found out I can reproduce the
Final Question(s):
Can anyone explain the above methods and their pros and shortcomings?
Which of these above methods is most appropriate?
Is there an alternative approach I should instead use?
Update:
Here is my final attempt/approach (having not yet received a response to my Q):
#load packages:
library(lme4)
library(ggplot2)
#Build model:
model <- lmer(uptake ~ conc * factor(Type) + (1 + conc|Plant), data = CO2)
#Predict values:
pred <- predict(model,re.form = NA)
#Bootstrap CI:
boot <- bootMer(model, predict, nsim = 100, re.form = NA)
std.err <- apply(boot$t, 2, sd)
CI.lo <- pred - std.err*1.96
CI.hi <- pred + std.err*1.96
#Plot
ggplot(CO2, aes(x = conc, y = uptake, colour = Type)) +
geom_line(aes(y = pred),size=2) +
geom_line(aes(x = conc,y = uptake,group = Plant), alpha = 0.2) +
geom_ribbon(aes(ymin = CI.lo, ymax = CI.hi),size=2, alpha = 0.03, linetype = 0)
Other than being a really bad model fit, does this approach seem legitimate and sound?