3

I am planning to simulate a mixed-effects model fitted with robustlmm::rlmer to validate if the model is correctly specified or not by using DHARMa. The model formula can be written as:

fit.robust <- rlmer(y ~ x + (1 | group))

Let us assume that there are 30 groups in the data, $g \in {1, 2, ..., 30}$ and $n$ observations.

In order to unconditionally simulate this model, I am going to set up a simulation, where the conditional mode for each group, $u_g$, is going to drawn from $N(0, \hat\sigma_u)$ for each observation and then weighted by the estimated robustness weight, $w_g$. Similarly, the Gaussian error, $\epsilon_{i}$, will be drawn from $N(0, \hat\sigma_{e} )$ for each observation and weighted by the estimated robustness weight, $w_{i, e}$. Thus, a simulated response can be expressed as follows:

$y_{i, g}$ = $\hat{b}_0$ + $\hat{b}_1x_i$ + $w_{g}$. $u_{i, g}$ + $w_{i, e}$. $\epsilon_{i}$

where $i \in {1, 2, ..., n}$, and $\hat{b}_0$ and $\hat{b}_1$ are estimated intercept and estimated fixed effect for $x$, respectively.

The vectors $w_e$ (of length n) and $w_g$ (with 30 elements) could be obtained by using getME(fit.robust, "w_e") and getME(fit.robust, "w_b"), respectively. For more details, please see the manual of robustlmm.

Would this be a correct way to carry out unconditional simulation i.e. not using estimated conditional modes, to generate response vector for calculating DHARMa residuals? I am aware that I will have to calculate a simulated-response matrix than just a simulated-response vector to get DHARMa residuals.

  • 1
    I'm a bit surprised that DHARMa doesn't already support rlmer. You may consider using the createDHARMa function to generate the residuals since it isn't already supported, as advised here and perhaps contacting Florian about adding it somehow.. – Shawn Hemelstrand Dec 30 '23 at 13:16
  • 1
  • 1
    "Robust estimation" is not a generative model, so you can't simulate data from it. Instead simulate data from a mixed effects model with heterogeneity. You'd have to consider what kind of heterogeneity you want to simulate. – dipetkov Jan 06 '24 at 18:31
  • @dipetkov Thank you for your comment. Could you please elaborate on the statement "Robust estimation is not a generative model"? I was thinking till now that if we can fit a model to data to understand the association between predictor and response variables, we might as well try simulating it to see how simulated data sets look like as compared to the observed data (the rationale behind posterior predictive checks). (Cont.) – medium-dimensional Jan 06 '24 at 19:03
  • But your comment took me back to this phrase in Koller (2013, p. 11): "... we follow the so-called central model approach. That is, we assume the model to be true, but a part of the data to possibly be contaminated." Is it because we assume the model, in this case, a mixed effects model, to be true, we should simulate the data from it? – medium-dimensional Jan 06 '24 at 19:04
  • The challenge is the "possibly contaminated" bit. To generate data you would need a model for the contamination too. (You can still do it; just you have to decide what shape & form the contamination takes.) For a simpler analogy, consider simulating data with given mean and std deviation vs data with given median and IQR (which are robust to some contamination). – dipetkov Jan 06 '24 at 20:06
  • Just noticed that the robustlmm package has a "Replication Code For Simulation Studies" vignette on CRAN. Have you taken a look at it? – dipetkov Jan 06 '24 at 20:17

0 Answers0