Random slope and intercept correlation. Not consistent in output vs manual calculation

Question

Lets say individuals are nested within each ID and I am trying to a predict level 1 outcome Y from a level 1 predictor X with random slopes and intercepts. Using the nlme package in R, I ran the following:

> library(nlme)
> set.seed(123)
> Y=rnorm(100)+seq(.01,1,.01)
> X=rep(1:10,10)
> ID=sort(rep(1:10,10))
> Model=lme(method="ML",Y~X,random=~X|ID)
> Model
Linear mixed-effects model fit by maximum likelihood
  Data: NULL 
  Log-likelihood: -136.9366
  Fixed: Y ~ X 
(Intercept)           X 
 0.45724997  0.02511926 

Random effects:
 Formula: ~X | ID
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev     Corr  
(Intercept) 0.42405364 (Intr)
X           0.01936927 -0.87 
Residual    0.91060089       

Number of Observations: 100
Number of Groups: 10 
> RandomEffects=random.effects(Model)
> RandomEffects[1:5,]
  (Intercept)            X
1 -0.29776233  0.009528802
2 -0.11581782  0.002489661
3 -0.58731522  0.022490872
4  0.09029364 -0.004322318
5 -0.11383183  0.004338493
> cor(RandomEffects)
            (Intercept)          X
(Intercept)   1.0000000 -0.9952382
X            -0.9952382  1.0000000
>

As you can see, the slope/intercept correlation for the output (-0.87) vs the one I calculated (-.995) are not consistent, although they are close. Why is this the case? How can I manually calculate and come up with the outputted correlation?

I find this inconsistency in every analysis, although the correlations are always close. I welcome any ideas. Thank you for your input.

FYI, I tried the same procedure using the lmer function in the lme4 package and got the exact same results. — Hotaka, May 21 '15 at 10:54
Yup. I did. slightly different results, but the correlations are still inconsistent — Hotaka, May 21 '15 at 15:59

alexforrence · Accepted Answer · 2015-05-22T18:13:24.450

You're conflating two slightly different concepts. In short, the output of VarCorr() is based on the estimated variance-covariance matrix of the unconditional distribution of the random effects, and is not dependent on the observed data $\mathbf{\mathcal{Y}}$. The correlations that you calculated manually are based on the conditional modes of the random effects, which are where the density of $\mathbf{\mathcal{Y}}$ is maximal given the random effects, which means that they are conditional on the observed data (paraphrased from here).

The correlation output of VarCorr() (what you see in the output of print(model)) is based on the estimate of $\mathbf{\Sigma}$ for the marginal, or unconditional, distribution of the random effects, given by

$\mathbf{\mathcal{B}} \backsim \mathcal{N}(\mathbf{0},\mathbf{\Sigma})$

Where $\mathbf{\mathcal{B}}$ is the multivariate normal vector of random effects, with mean 0 and q x q variance-covariance matrix $\mathbf{\Sigma}$, and q is the number of columns in the random effects model matrix (see the lme4 paper for details, and you can check in lme4 with getME(model, "q")).

The conditional modes are where the density of the conditional distribution of $(\mathbf{\mathcal{Y}}|\mathbf{\mathcal{B}} = \mathbf{\mathcal{b}})$ takes the highest value. They are calculated for each level of the grouping factor of the random effect, and so the correlation of these can change depending on what data you've observed.

As for figuring out the manual calculation, see fortunes::fortune(250). The relevant code is in nlme:::VarCorr.pdMat or lme4:::mkVarCorr (I think the lme4 code is a little easier to read). I think the way lme4 does it is:

Takes $\sigma$ (the residual standard deviation/scale factor), random effect component names, number of terms for each component, and $\theta$ (the random effects parameters as Cholesky factors) as input.
For each random effects term $\mathcal{i}$,
- Takes the cross product of the scale factor and the transpose of $\Lambda_i$ (which is dependent on $\theta$), which gives the variance-covariance matrix
- Calculates the standard deviation as the square root of the diagonal of the vcov matrix
- Calculates the correlation as covariance (non-diagonal terms) divided by the standard deviation

A (not very flexible) version of how mkVarCorr calculates the correlation:

library(lme4)

fm1 <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy, REML = FALSE)

getRanCorr <- function(mod) {
    n_ranefs <- ncol(ranef(mod)[[1]])
    theta <- getME(mod, "theta")
    sig <- sigma(mod)    
    Li <- diag(nrow = n_ranefs)

    Li[lower.tri(Li, diag = TRUE)] <- theta
    val <- tcrossprod(sig * Li)
    stddev <- sqrt(diag(val))    
    corr <- t(val/stddev)/stddev # or cov2cor(val), but I suspect there's a
                                 # reason for this
    corr[2,1]
    }

getRanCorr(fm1)
##[1] 0.08131969

VarCorr(fm1)
##Groups   Name        Std.Dev. Corr 
##Subject  (Intercept) 23.7806       
##         Days         5.7168  0.081
##Residual             25.5918

And with your model,

mod <- lmer(Y ~ X + (X|ID), data = df, REML = FALSE)

getRanCorr(mod)
##[1] -0.870463

attr(VarCorr(mod)$ID,"correlation")[2,1]
##[1] -0.870463

Seems I'm headed the right direction, but I'd appreciate more information. I'm not sure what fortunes::fortune(250) is for. All it gives me is a short quote. I looked at nlme:::VarCorr.lme but it uses VarCorr within the code, which doesn't make sense. Also, lme4:::mkVarCorr looks promising but I can't find good documentation or examples. Can you show in code how I can use mkVarCorr to calculate the correlation using the data I created above? Much appreciated. — Hotaka, May 22 '15 at 17:04
@Hotaka Ah, didn't read it carefully enough -- nlme:::VarCorr.pdMat instead. As for a source for finding correlation from a variance-covariance matrix, does something like this help? I can post some hacky R code (based on mkVarCorr) in a minute. — alexforrence, May 22 '15 at 17:48
Looks great. Could you explain what theta is exactly, or lead me to another source that explains it? Everything else seems straight forward. — Hotaka, May 22 '15 at 18:20
Also, I just asked a related question. I'd appreciate it if you provided your insight: http://stats.stackexchange.com/questions/153589/random-slope-and-random-intercept-correlation-at-every-level-of-x — Hotaka, May 22 '15 at 18:25
(To mostly quote here and the ?getME), $\theta$ is the random effects parameter estimates, parameterized as relative Cholesky factors of the scaled vcov matrix. We get the original vcov matrix $\Sigma$ (val above) by doing the cross product and whatnot. — alexforrence, May 22 '15 at 18:40
Sweet. Last question. Do you think VarCorr() give you a more useful/valid correlation than cor()? — Hotaka, May 22 '15 at 19:14
You should ask a separate question, so that more people can participate easily. — alexforrence, May 22 '15 at 20:07

score 0 · Answer 2 · answered May 21 '15 at 02:42

0

This reminds me of a similar inquiry a professor brought up to me when studying autocorrelation which was why do they measure the correlation based on the residuals and not the x and y variables considering they both gave similar but different correlations like your situation and it came down to the residuals being a more conservative test... maybe the test ran in your mixed model is more conservative in some way to your approach.

answered May 21 '15 at 02:42

DaveRowan

108

Well correlations are not a significance test, so what do you mean by conservative? Maybe the output shows the correlation in the population while mine is sample-dependent? – Hotaka May 21 '15 at 10:29

Random slope and intercept correlation. Not consistent in output vs manual calculation

2 Answers2

Linked