20

Say that you are in the library of your department of statistics, and that you come across a book with the following picture in the front page.

enter image description here

You will probably think that this is a book about linear regression things.

What would be the picture that would make you think about linear mixed models?

ocram
  • 21,851

3 Answers3

16

For a talk, I've used the following picture which is based on the sleepstudy dataset from the lme4 package. The idea was to illustrate the difference between independent regression fits from subject-specific data (gray) versus predictions from random-effects models, especially that (1) predicted values from random-effects model are shrinkage estimators and that (2) individuals trajectories share a common slope with a random-intercept only model (orange). The distributions of subject intercepts are shown as kernel density estimates on the y-axis (R code).

enter image description here
(The density curves extend beyond the range of observed values because there are relatively few observations.)

A more 'conventional' graphic might be the next one, which is from Doug Bates (available on R-forge site for lme4, e.g. 4Longitudinal.R), where we could add individual data in each panel.

enter image description here

chl
  • 53,725
  • +1. Good one! I think your first plot is great on a conceptual level. My only comment would be that it requires significantly more explanation than a standard "naive" plot and if the audience isn't up to speed with the concepts of LME models and longitudinal data it might miss the point of the plot. I will definitely remember it for a solid "stats talk" though. (I have already seen the second plot in the "lme4 book" a couple of times. I wasn't too impressed then and I am not too impressed now either.) – usεr11852 Mar 03 '13 at 06:44
  • @chl: Thanks! I will choose amongst the proposals. In the meanwhile, +1 – ocram Mar 03 '13 at 07:22
  • @user11852 My understanding of the RI model is that OLS estimates are correct, but their standard errors aren't (because of the lack of independence) so that individual predictions will be incorrect as well. Usually, I would show the overall regression line assuming independent observations. Then, the theory tell us that combining conditional modes of the random effects and estimates of the fixed effects yields conditional modes of the within-subject coefficients, and there will be little skrinkage when statistical units are different, or when measurements are accurate, or with large samples. – chl Mar 03 '13 at 09:06
  • @chi : I agree, as I mentioned originally the whole idea of using "groupings" is exactly because one originally identifies "groups of heteroskedasticity in the residuals of an OLS plot". (So to have practically $y|\gamma \sim N(X\beta + Z\gamma, \sigma^2 I)$ or unconditionally $y \sim N(X\beta, ZDZ^T + \sigma^2 I)$ – usεr11852 Mar 03 '13 at 14:49
  • The link to the R code to create the picture is broken. I would be interested in how to draw the distributions vertically in the figure. – Niels Hameleers Mar 28 '19 at 07:04
  • @chl I am dying to know the R code for the first plot of your answer. I would like to know how the lines of random intercept and slope are drawn, but the link for the R code seems to be broken. I would appreciate greatly if could you update it. – Doong Nov 20 '19 at 14:31
  • 1
    @PSY Broken link fixed. – chl Sep 14 '20 at 17:58
10

So something not "extremely elegant" but showing random intercepts and slopes too with R. (I guess it would be even cooler if if showed the actual equations also) enter image description here

N =100; set.seed(123);


x1 = runif(N)*3; readings1 <- 2*x1 + 1.0 + rnorm(N)*.99;
x2 = runif(N)*3; readings2 <- 3*x2 + 1.5 + rnorm(N)*.99;
x3 = runif(N)*3; readings3 <- 4*x3 + 2.0 + rnorm(N)*.99;
x4 = runif(N)*3; readings4 <- 5*x4 + 2.5 + rnorm(N)*.99;
x5 = runif(N)*3; readings5 <- 6*x5 + 3.0 + rnorm(N)*.99;

X = c(x1,x2,x3,x4,x5);
Y = c(readings1,readings2,readings3,readings4,readings5)
Grouping  = c(rep(1,N),rep(2,N),rep(3,N),rep(4,N),rep(5,N))

library(lme4);
LMERFIT <- lmer(Y ~ 1+ X+ (X|Grouping))

RIaS <-unlist( ranef(LMERFIT)) #Random Intercepts and Slopes
FixedEff <- fixef(LMERFIT)    # Fixed Intercept and Slope

png('SampleLMERFIT_withRandomSlopes_and_Intercepts.png', width=800,height=450,units="px" )
par(mfrow=c(1,2))
plot(X,Y,xlab="x",ylab="readings")
plot(x1,readings1, xlim=c(0,3), ylim=c(min(Y)-1,max(Y)+1), pch=16,xlab="x",ylab="readings" )
points(x2,readings2, col='red', pch=16)
points(x3,readings3, col='green', pch=16)
points(x4,readings4, col='blue', pch=16)
points(x5,readings5, col='orange', pch=16)
abline(v=(seq(-1,4 ,1)), col="lightgray", lty="dotted");        
abline(h=(seq( -1,25 ,1)), col="lightgray", lty="dotted")   

lines(x1,FixedEff[1]+ (RIaS[6] + FixedEff[2])* x1+ RIaS[1], col='black')
lines(x2,FixedEff[1]+ (RIaS[7] + FixedEff[2])* x2+ RIaS[2], col='red')
lines(x3,FixedEff[1]+ (RIaS[8] + FixedEff[2])* x3+ RIaS[3], col='green')
lines(x4,FixedEff[1]+ (RIaS[9] + FixedEff[2])* x4+ RIaS[4], col='blue')
lines(x5,FixedEff[1]+ (RIaS[10]+ FixedEff[2])* x5+ RIaS[5], col='orange') 
legend(0, 24, c("Group1","Group2","Group3","Group4","Group5" ), lty=c(1,1), col=c('black','red', 'green','blue','orange'))
dev.off()
usεr11852
  • 44,125
  • Thanks! I wait a little bit more for potential new answers... but I might build on this one. – ocram Mar 02 '13 at 18:18
  • I am a bit confused by your figure, because the right subplot looks to me as if separate regression line were fit to each group. Isn't the whole point that mixed model fits should be different from independent per-group fits? Perhaps they are, but in this example it is really hard to notice, or am I missing something? – amoeba Jan 08 '15 at 00:47
  • 2
    Yes, the coefficient are different. Nope; a separate regression was not fit for each group. The conditional fits are shown. In a perfectly balanced, homoskedastic design as this one the difference will be hard to notice indeed, for example group 5's conditional intercept is 2.96 while the independent per-group intercept is 3.00. It is the error covariance structure you are changing. Check chi's answer too, it has more groups but even there in very few cases the fit is "a lot different" visually. – usεr11852 Jan 09 '15 at 14:12
6

Not my work

This graph taken from the Matlab documentation of nlmefit strikes me as one really exemplifying the concept of random intercepts and slopes quite obviously. Probably something showing groups of heteroskedasticity in the residuals of an OLS plot would be also pretty standard but I wouldn't give a "solution".

usεr11852
  • 44,125
  • Thanks for your suggestion. Although it looks like mixed logistic regression things, I guess I can easily adapt it. I wait for more suggestions. In the meanwhile, +1. Thanks again. – ocram Mar 02 '13 at 12:36
  • It looks likes a mixed logistic regression mostly because it is one... :) It was the first plot that popped into my mind really though! I'll give something purely R-ish in a second answer. – usεr11852 Mar 02 '13 at 13:01