I wish to perform linear regression over a data set whose entries can be divided into two or more groups. The groups could be, for example, the date at which observations where taken, or the patient Ids.
It seems that this can be approached by fitting a linear mixed effect model. Or, are there more simpler yet equivalent solutions that only involve standard linear models?
E.g., perhaps by adding additional binary features for each group that indicate which group an observation belongs to (one-hot encoding) and then fit a linear model (e.g., with lm() in R).
For example let data1 be:
y x x1
-3.669049 -0.3851723 2
-4.223906 -0.4416519 2
10.443685 0.9347280 1
20.530023 2.0341488 2
2.915306 0.1803468 1
8.284428 0.7195443 1
-5.183832 -0.5389523 2
3.803867 0.3585491 2
5.212799 0.5110681 0
and the one hot encoded is (data2):
y x x_0 x_1 x_2
-3.669049 -0.3851723 0 0 1
-4.223906 -0.4416519 0 0 1
10.443685 0.9347280 0 1 0
20.530023 2.0341488 0 0 1
2.915306 0.1803468 0 1 0
8.284428 0.7195443 0 1 0
-5.183832 -0.5389523 0 0 1
3.803867 0.3585491 0 0 1
5.212799 0.5110681 1 0 0
is lm(y ~ x + x_0 + x_1 + x_2 - 1, data2) equivalent to
lmer(y ~ x + (1|x1), data1)?
Are the two approaches equivalent or why should I chose one over the other? Can you please help me gain some intuition over this, as I would say that they are equivalent, but i cannot obtain the same results with the two approaches.
PS: i found this answer Random effects vs one-hot encoding but that concerns the prediction accuracy, while I am interested in the regression coefficient and their interpretation.