4

I have a highly skewed dataset. But, my MODEL of choice below shows drastically improved, normally distributed residuals (and predicted values) compared to other models whose residuals are not modeled.

Two questions:

1- Is MODEL below assuming that my data for each subject come from normal populations whose variances are unequal across levels of X1_categorical as well as being also a power function of X2_numeric variables?

2- Does the distribution of residuals (below) tell us anything about the distribution of any part of data (ex. data for each subject, or the marginal distribution across all subjects etc.)

hist(resid(MODEL, type = "normalized"))
 MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric,
         random = ~1| subject,
         data = data,
         correlation = corSymm(~1|subject),
         weights = varComb(varIdent(form = ~ 1 |  X1_categorical ),
                                          varPower(form = ~  X2_numeric )))

1 Answers1

5

From the correlation and weights arguments in the column it appears that you have specified a fully unstructured variance covariance matrix. This is often a good choice since it imposes no a priori structure and let's the data do the talking.

I think the first of your questions is correct, but I'm not sure what you are getting at in your second question.

Robert Long
  • 60,630
  • 1
    From ?nlme::varPower, " the power variance function is defined as s2(v) =|v|^(2t), where t is the variance function coefficient", i.e. the standard deviation* increases as a power of t of the specified covariate – Ben Bolker Dec 20 '23 at 17:56
  • Dear Robert, thank you. corSymm is the unstructured representation of Level 1 residuals not a compound symmetric one. Also, can you please connect my 2nd question to your answer when you say: "A histogram of residuals may indicate that they are plausibly normally distributed"? – Simon Harmel Dec 20 '23 at 21:00
  • Doh, I should really read the documentation before answering sometimes ! You are right, it's not compound symmetric. I will amend my answer a bit later. – Robert Long Dec 21 '23 at 14:45
  • To be honest, I think my first question (the correct one) is also not fully answered. I already know what corSymm does, my first question was about the weights= part and the way it envisions the data generating process? My 2nd question (the incorrect one) has to do with the fact that in a linear model each data point on the response variable (y_ij) is modeled as an additive function of a predicted value and sum of the distances of that predicted value has from y_ij. I thought that under normality of response for each subject, or across all subjects, the level-1 residual is a reflection... – Simon Harmel Dec 21 '23 at 22:20
  • ... of the distribution of y_ij. – Simon Harmel Dec 21 '23 at 22:20
  • @BenBolker, Ben is the model I sketched in my question essentially a location-scale model? – Simon Harmel Dec 22 '23 at 15:53