2

I have a response variable that is non-normaly distributed (~Gamma). Due to the fact that I have a lot of "contamination", I would need to use a robust mixed-effects model method that is able to remove it. I was thinking on using the package robustlmm, however, I don't know if I can use it since my data don't follow a normal distribution.

Does anyone know something about that?

Dekike
  • 401

1 Answers1

2

Note that there is no requirement for your data to follow a normal distribution.

To perform certain inferences we would like the residuals to be approximately normally distributed. However this is not required if you are only interested in prediction.

rlmer:robustlmm does not fit generalized linear mixed models as far as I am aware, so the gamma distribution would not be available.

Robert Long
  • 60,630
  • Thanks @Robert Long, so, If I understood well, there is no restriction in my case to use robustlmm::rlmer(), right? I wonder what you mean distinguishing between inference and predictions. In my case, Y is a measure of activity (m.s⁻²) of an animal taken with one device (A), and X is the measure of the same thing (=activity) with another device (B) with more restricted settings and in a different position than device A. The advantage of dev B is that it allows to record longer time periods. Then Y is more accurate measure than X might be very useful in my field (ecology). – Dekike Sep 23 '20 at 10:27
  • Thus, I want to assess the relationship among Y and X and discuss the suitability of using dev B. My idea was to stablish if the relationship is linear or exponential between variables and also to calculate R². What is my problem? The animal(s) moves little, meaning that I get a gamma distribution of activities. However, the biggest problem is that X measures sometimes go far from the general trend due to its settings and position, generating some extrange patterns in my plot of residuals vs predicted values. – Dekike Sep 23 '20 at 10:32
  • I tried log-transforming variables, using "natural splines" or using robustlmm to remove the "contamination", which is not contamination. However, I couldn't remove completely my residual patterns. Now, I wonder if I could just run the models I though from the beginning (GLMM with a gamma distribution and a log link function), and then show the residuals patterns to finally say that more research should be done for using X instead of Y, because it is clear that some settings of X are doing that predictions are better or worst depending on X value. – Dekike Sep 23 '20 at 10:39
  • Using GLMM I get a r2m of 80%, which means that X explains 80% of the variance of Y, however, given that I have residual patterns, I don't know what to do. Your comment about that normal distribution is (or not) required depending on if I make inferences or predictions, has made me think about all this. Could you give me your advice? I am a little bit lost. – Dekike Sep 23 '20 at 10:39
  • OK. Are you interested in making predictions for new data, or investigating causal relations among your data ? What is your research question ? – Robert Long Sep 23 '20 at 13:02
  • I am interested in predictions, not in causal relations. I want to know how well values from device B predicts values from device A. In the simplest case, I guess it would be enough with R² measuring the relationship between A and B devices (Y and X, respectively). Since I have data for 6 individuals and data distribution is gamma, I though on a GLMM, and thus, calculate r2m instead of R². However, I get resial patterns. I guess that the only solution is either log-transforming or maybe, using a robustlmm? – Dekike Sep 23 '20 at 15:11
  • Hi again Robert, in this post (https://stats.stackexchange.com/questions/488423/can-i-validate-a-residual-plot-although-it-has-residual-patterns-if-i-am-not-int) I explain a similar case. Note that in the post the case is slightly different. In the post I linked what I do is to create one hour time intervals, and then, I have hourly mean values of Y (in the other post I call it a), hourly mean values of X (in the post I call it b), and ALSO I have number of records per hour with method B. All this is because I am interested on assessing the performance of method B. – Dekike Sep 23 '20 at 16:35
  • In that example and in the example I have being explaining to you here, the situation is similar: I have a response variable with a gamma distribution, but when I run a GLMM I get residual patterns and I don't know what to do. Log-transform variables is my last option since I am changing my variables and I think that for interpretation is harder. – Dekike Sep 23 '20 at 16:37
  • Ok. I think you need to post a new question with all this additional info. – Robert Long Sep 23 '20 at 16:48
  • Hi Robert, in the post I linked in the previous comment is posed the problem with extended details. This problem was what made me to think about the use of robustlmm and ask the question posted here. Why do you ask me to make another post? Is it not coear in the other one? Thanks in advance. – Dekike Sep 23 '20 at 19:33
  • You're right. The other post is quite clear. – Robert Long Sep 23 '20 at 19:52