3

I'm getting really very large variance for my random effects

Random effects:
 Groups   Name        Variance  Std.Dev.
 sub      (Intercept) 8.429e+07 34201   
 Residual             9.983e+09 48821   
Number of obs: 17128, groups:  sub, 497

what could be the reason for this? Does this mean that there must be something wrong with my data or model? What kind of diagnostics would you recommend I do?

locus
  • 1,593
  • @a_statistician, yes, var(dat$Y) > 9.983e+09. But what do you mean by two estimates of variance? And why is the variance so large in the first place? – locus Oct 29 '18 at 19:49
  • I see. So if my random effects model is Y ~ f1*f2 + (1|sub), the no random effects model would be Y ~ f1*f2. The problem is lmer requires random effects in the formula, otherwise it will give me an error... What you are suggesting would be like conducting a simple multiple regression lm(Y ~ f1*f2, data=dat)? – locus Oct 29 '18 at 20:03
  • @a_statistician. Yes, the MSE is also greater than 9.983e+09. It's actually very close to var(dat$Y). Just for my understanding, why is it important that these two estimates are lower than the 9.983e+09 estimate? – locus Oct 29 '18 at 20:15
  • 1
    In the absence of any indication about the data values, "large" is completely meaningless. For instance, if your random effects have a variance of one square kilometer and you re-express them in microns, the variance is $10^{18}$ square microns--but doesn't mean anything different than it did before. – whuber Oct 29 '18 at 20:53
  • @a_statistician. Thanks, that's very helpful. Would you like to turn your comments into an answer so that I can accept it? – locus Oct 29 '18 at 23:49
  • @whuber I get your point, but that's exactly my problem. I don't know if there is actually an issue with my data/model, so I'm just looking for diagnostic tools that might help me understand whether this variation is sensible or not. The range(dat$Y) is [34, 8.27e+05] so there is a lot of variation in my data – locus Oct 30 '18 at 00:05
  • 1
    The standard deviations you report are just a tiny fraction of that range. In that sense, you're getting a small variance, not a large one. It would be more useful to report a better description of the variation of the responses: after all, the range could depend on a single outlying value. – whuber Oct 30 '18 at 00:55
  • I have a similar issue and it would seem that what "appeared" like large variance estimates are actually because I am using meters instead of kilometers because when I "convert" my variance estimates in km (by dividing by 1000) the variance is no longer so "big". – Blundering Ecologist May 07 '19 at 14:25

1 Answers1

1

Your results are uncommon. Generally, we record our research data in a format with units <10000. For example, when we get the height of a tree, we will use meters as the unit, instead of mm as the unit. If your data follow our common practice (<10000), it is hard to have large residual variance estimates. So it is possible the program is having a problem with analyzing your data.

$SST=\sum(Y_i-\bar Y)^2$ is the upper bound of that estimated variance. It is equivalent to the simplest model, i.e., a model with only an intercept. When you add the fixed effects into the model, some of SST will be explained by the fixed effects and the residual variance estimate (called SSE) should decrease. When the random effects are added, the variance is split between the residuals and the random effects. If this does not happen, something is wrong with the program, special values, etc. But it seems your situation is OK. If you think $Var(Y)$ should not be $> 9.983e+09$, then the values of Y variable in the dataset have a problem.

user158565
  • 7,461
  • I have a similar issue and it would seem that what "appeared" like large variance estimates are actually because I am using meters instead of kilometers because when I "convert" my variance estimates in km (by dividing by 1000) the variance is no longer so "big". – Blundering Ecologist May 07 '19 at 14:25