I am trying to calculate $R^2$ (variance explained) for a set of data using GLMM's, and . Here's some dummy data.
set.seed(7127)
# Number sampled
n = 10000
# Average height
m = 180
leg = rnorm(n, 75, 5)
torso = rnorm(n, 75, 5)
head = rnorm(n, 30, 1)
df = data.frame(1:n, sample(leg, n, replace = T), sample(torso, n, replace = T), sample(head, n, replace = T))
df$height = rowSums(df[,2:4])
colnames(df) = c("person", "leg", "torso", "head", "height")
This is height data for 1000 people, measured as three components of height (leg, torso, and head) which sum to give total height. I want to have an $R^2$ value for each of the three components, where $R^2$ is how much of the variance in height is explained by leg length, torso length, and head length individually.
To do this I have made 3 models where each component is modelled as a fixed effect, with a random effect (randomly assigned the group 1 or 2, these could be analogous to sex [male or female]), as well as a random effects only model (order in code is null, leg, torso, head).
mod0 = lmer(df$height ~ 1 + (1|df$random))
mod0
modL = lmer(df$height ~ df$leg + (1|df$random))
modL
modT = lmer(df$height ~ df$torso + (1|df$random))
modT
modH = lmer(df$height ~ df$head + (1|df$random))
modH
Following the protocol of Nakagawa and Schielzeth (2013) I have tried to calculate Marginal and Conditional $R^2$ values for each component of height (order in code is leg, torso, head).
# Marginal R squares
VarFixedL = var(fixef(modL)[2]*getME(modL,"X")[,2])
R2M_L = VarFixedL / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2M_L
VarFixedT = var(fixef(modT)[2]*getME(modT,"X")[,2])
R2M_T = VarFixedT / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2M_T
VarFixedH = var(fixef(modH)[2]*getME(modH,"X")[,2])
R2M_H = VarFixedH / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2M_H
# Conditional R squares
R2C_L = (VarFixedL + VarCorr(modL)$df[1]) / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2C_L
R2C_T = (VarFixedT + VarCorr(modT)$df[1]) / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2C_T
R2C_H = (VarFixedH + VarCorr(modH)$df[1]) / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2C_H
The results suggest that the variance in height is largely explained by leg and torso length (~49% each). (Edit: I have tested this on the sample data provided by the Nakagawa & Schielzeth paper and reproduced their results, but would appreciate feedback on whether this is correct).
> R2M_L
[1] 0.4869814
> R2M_T
[1] 0.4909379
> R2M_H
[1] 0.02097508
> R2C_L
[1] 0.4871594
> R2C_T
[1] 0.4909379
> R2C_H
[1] 0.02097508
Here the two types of $R^2$ are (virtually) identical. If I introduce variance as a result of the random group (e.g. df$height = ifelse(df$random == "2", df$height, df$height+rnorm(n, 10, 1))) I get differences between the two types, could someone explain the difference between the Marginal and Conditional $R^2$. Why would one be better or, when would one be more appropriate, than the other? Why do they react differently to the addition of random or unmeasured variance components?
MuMInlibrary so you could use the already implemented version. – Tim Mar 24 '15 at 08:01