Include single variable as both fixed and random effect

Question

I understand that is OK, or sometimes even necessary, to include an independent variable as both a fixed and random effect in a linear mixed model. However, what happens if your model only has a single independent variable and the variable is used as both as a fixed and random variable?

My instinct is that all the variance of the variable should be "soaked up" by the random effect and that the fixed effect should therefore have little significance. Is this a correct supposition?

Here is an example using R.


data(mtcars)
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Let's run a mixed model with cyl as both a fixed and random effect and mpg as the dependent variable, using the lme4 package.

library(lme4)
summary(lmer(mpg ~ cyl + (1|cyl),data=mtcars))
boundary (singular) fit: see help('isSingular')
Linear mixed model fit by REML ['lmerMod']
Formula: mpg ~ cyl + (1 | cyl)
   Data: mtcars
REML criterion at convergence: 163.1
Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.55383 -0.66082  0.06917  0.33430  2.34523
Random effects:
 Groups   Name        Variance Std.Dev.
 cyl      (Intercept)  0.00    0.000

 Residual             10.28    3.206

Number of obs: 32, groups:  cyl, 3
Fixed effects:
            Estimate Std. Error t value
(Intercept)  37.8846     2.0738   18.27
cyl          -2.8758     0.3224   -8.92
Correlation of Fixed Effects:
    (Intr)
cyl -0.962
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

I was surprised to see a pretty hefty t value for the cyl fixed effect (although admittedly the fit is singular). Further, the variance for the cyl random effect is zero. Consistent with these observations, the t value for the cyl fixed effect in the linear mixed model are essentially the same as for a simple linear model:


summary(lm(mpg ~ cyl,data=mtcars))
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186
Coefficients:
            Estimate Std. Error t value Pr(>|t|)

(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

Is my understanding that the random effects should remove most significance from the fixed effect correct in this context?

Thanks!

Shawn Hemelstrand · Answer 1 · 2022-12-08T07:23:13.220

There are a number of issues with this model. First, you don't have enough random effects required for clustering in a random intercept (the 1|cyl part). The cyl variable only has three levels, and would thus make an okay categorical variable as a fixed effect, but a very poor random effect. Normally there should be at least five levels.

The much bigger problem is that including it as both a fixed and random effect in this way makes the model completely redundant. A fixed effect regression with a categorical variable with your terms looks something like this:

$$ \operatorname{mpg} = \beta_{0}(\operatorname{cyl}_{\operatorname{4}}) + \beta_{1}(\operatorname{cyl}_{\operatorname{6}}) + \beta_{2}(\operatorname{cyl}_{\operatorname{8}}) + \epsilon $$

where each coefficient term represents the mean value of mpg for each cyl category. Mixed effects modeling estimates the intercept of each categorical variable already when included as a random effect. This means it is also a conditional mean which is similar to the one used as a fixed version, so you have made your model pointlessly redundant by counting in the conditional mean twice for each cyl category. Notice that the random effects portion of your lme4 summary hints at this issue...there is literally zero standard deviation difference in your random intercept. This is because after estimating the fixed means of each category, it cannot approximate any random effects shifts thereafter...they have already been fully estimated in the model.

There are ways of using a fixed effect as a random effect (usually including it as a random slope term), but that is a different matter.

Thank you for your answer @Shawn Hemelstrand. I think I did understand all of your points, although that may have not been clear in my question. To rephrase my question, why does the linear mixed model algorithm assign all the variance to the fixed part of the model rather than the random part? I would have expected that the random part would take priority over the fixed part. — Bob, Dec 08 '22 at 07:08
This can be done if you include only the random effects by themselves, such as lmer(mpg ~ (1|cyl), data = mtcars. However, a full model with fixed effects first has to estimate the aggregated effects of the predictors, then conditional shifts of both random slopes and intercepts after these values are obtained. It would be impossible otherwise to do so (as far as I know). — Shawn Hemelstrand, Dec 08 '22 at 07:13

Include single variable as both fixed and random effect

1 Answers1

Linked