Linear mixed-effects models or one-hot encoding for grouped data

Question

I wish to perform linear regression over a data set whose entries can be divided into two or more groups. The groups could be, for example, the date at which observations where taken, or the patient Ids.

It seems that this can be approached by fitting a linear mixed effect model. Or, are there more simpler yet equivalent solutions that only involve standard linear models?

E.g., perhaps by adding additional binary features for each group that indicate which group an observation belongs to (one-hot encoding) and then fit a linear model (e.g., with lm() in R).

For example let data1 be:

         y          x x1
 -3.669049 -0.3851723  2
 -4.223906 -0.4416519  2
 10.443685  0.9347280  1
 20.530023  2.0341488  2
  2.915306  0.1803468  1
  8.284428  0.7195443  1
 -5.183832 -0.5389523  2
  3.803867  0.3585491  2
  5.212799  0.5110681  0

and the one hot encoded is (data2):

         y          x  x_0    x_1    x_2
 -3.669049 -0.3851723  0    0    1
 -4.223906 -0.4416519  0    0    1
 10.443685  0.9347280  0    1    0
 20.530023  2.0341488  0    0    1
  2.915306  0.1803468  0    1    0
  8.284428  0.7195443  0    1    0
 -5.183832 -0.5389523  0    0    1
  3.803867  0.3585491  0    0    1
  5.212799  0.5110681  1    0    0

is lm(y ~ x + x_0 + x_1 + x_2 - 1, data2) equivalent to lmer(y ~ x + (1|x1), data1)?

Are the two approaches equivalent or why should I chose one over the other? Can you please help me gain some intuition over this, as I would say that they are equivalent, but i cannot obtain the same results with the two approaches.

PS: i found this answer Random effects vs one-hot encoding but that concerns the prediction accuracy, while I am interested in the regression coefficient and their interpretation.

Does this answer your question? What is the difference between fixed effect, random effect and mixed effect models? — George Savva, Sep 22 '23 at 11:06
@GeorgeSavva thank you. that link does not really answer my question. Upon reading that answer, in particular definition 1, it looks to me that adding additional binary features as I suggested in the post only helps with a random intercept (not slope), but I am not sure. Is this correct? — altroware, Sep 22 '23 at 18:55
If that link doesn't answer your question then you need to set out your question more clearly. In particular what you mean by 'one-hot encoding' because my understanding is this means 'encoding the levels as mutually exclusive binary features then estimated a fixed effects model'. In which case you are asking about the difference between fixed and random effects, but I could be misunderstanding you. — George Savva, Sep 25 '23 at 08:41
@GeorgeSavva Ah I see, I now edited the question. Essentially, I am asking if fitting a standard linear model after one-hot encoding can be thought of being the same as fitting a mixed effect model. — altroware, Sep 25 '23 at 14:15
OK, I think I now understand the difference... and why you suggested that link: In a mixed-effect model the parameters (intercept and slopes) are correlated random variables, which make these suitable for fitting to data where the observations are correlated. Essentially a mixed-model encodes more hypotheses than a standard linear model — altroware, Sep 26 '23 at 23:36

score 3 · Answer 1 · answered Oct 04 '23 at 14:31

is lm(y ~ x + x_0 + x_1 + x_2 - 1, data2) equivalent to lmer(y ~ x + (1|x1), data1)?

No, it is not. The model fitted with lm is a fixed effects model, whereas the second model is a linear mixed effects model.

Let's say the grouping variable is a patient ID, as you mentioned.

Since you may have as few as two groups, fitting a mixed effects model is not a great idea. One of the main purposes of a mixed model is to account for correlations within groups (patients in this case). This seems like a natural approach since in many settings you would expect there to be variation among different patients. If there are substantive correlations then the data are not independent. The software you use will attempt to estimate the variance of a random variable having as many observations as there are patients (each patient represents one observation). With very few patients, trying to estimate the variance of a sample with very few observations is folly.

However, all is not lost. If your primary interest is in the fixed effect of the covariate (x in your case), then estimates and standard errors should be very similar in each model. There is no need to do one-hot encoding yourself - the software will take care of that.

Let's look at this further with some simulated data. I will use basically the same structure as your sample data, one fixed effect X, and one grouping variable (eg patient ID) with 3 levels.

The research question here would be to estimate the fixed effects of the covariate X.

library(lmerTest)
set.seed(1)
n_group <- 3
dt <- expand.grid(X = LETTERS[1:3], id = LETTERS[1:n_group], reps = 1:2)
X <- model.matrix(~ X, dt)
dt$Y <- 1
myFormula <- "Y ~ X + (1 | id)"
foo <- lFormula(eval(myFormula), dt)
Z <- t(as.matrix(foo$reTrms$Zt))
betas <- c(1, 0.5, 2) # Fixed effects parameters
u <- rnorm(n_group, 0, 2) # standard deviation of random intercepts
e <- rnorm(nrow(dt), 0, 1)   # residual error
dt$Y <- X %% betas + Z %% u + e

So completes the simulation, so that all we need to do now is fit the models and compare the fixed effect estimates:

m0 <- lmer(Y ~ X + (1 | id), data = dt)
summary(m0)
Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)

(Intercept)   0.4698     0.7462  2.9875   0.630  0.57387

XB            0.3350     0.5560 13.0000   0.603  0.55712

XC            2.2001     0.5560 13.0000   3.957  0.00164 **

...where I have omitted the random effects and other extraneous details.


m1 <- lm( Y ~ X + id, data = dt)
summary(m1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.3657     0.5075  -0.721  0.48390
XB            0.3350     0.5560   0.603  0.55712
XC            2.2001     0.5560   3.957  0.00164 **
idB           2.1687     0.5560   3.901  0.00182 **
idC           0.3378     0.5560   0.608  0.55390

So it is readily apparent that the estimates and standard errors are consistent between models. In the 2nd model. fitted with lm we also have estimates for id which we are not interested in.

However, this is not always the case. This is a balanced design. With very unbalanced data, or groups with widely different variances, and if you have very few levels of the grouping variable, then fit both models and compare. On the other hand if there are plenty of levels (say 20+) then the mixed model should work well. Between, say, 6 and 20, you might attract some criticism. There is no black and white answer to the minimum number of levels of the grouping variable(s) for mixed models.

Linear mixed-effects models or one-hot encoding for grouped data

1 Answers1

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.3657 0.5075 -0.721 0.48390

XB 0.3350 0.5560 0.603 0.55712

XC 2.2001 0.5560 3.957 0.00164 **

idB 2.1687 0.5560 3.901 0.00182 **

idC 0.3378 0.5560 0.608 0.55390