Can adding a random intercept change the fixed effect estimates in a regression model?

Question

I am trying to understand how mixed models work. I have heard people say that adding random effects does not change the effect estimates of the fixed effects. However, I have also heard that mixed models allow exploring the effects of a factor within related observations, as opposed to treating all observations as independent. My question is:

Suppose that there was a positive relationship between two variables, x and y, but within each level of some factor z, the relationship was negative (like in figure 2 in this webpage: https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/).

Would adding a random intercept for z change the coefficient for x from positive to negative?

score 9 · Accepted Answer · answered Jul 23 '20 at 11:09

Yes.

This is an example of Simpson's paradox. There are already plenty of resources online explaining Simpson's paradox, so I won't go into it here.

To see this in action, let's look at simulated behavioural data where

Participants produce a response, $y$, in response to varying stimuli, $x$.
Participants' intercepts are normally distributed, $\alpha_p \sim N(0, 1)$;
Participants with higher intercepts are exposed to higher average values of $x$, $\bar x_p = 2\times \alpha_p$.
Responses $y$ are drawn from the distribution $y \sim N(\alpha_p - .5\times(x - \bar x_p), 1)$

library(tidyverse)
library(lme4)
n_subj = 5
n_trials = 20
subj_intercepts = rnorm(n_subj, 0, 1) # Varying intercepts
subj_slopes = rep(-.5, n_subj)        # Everyone has same slope
subj_mx = subj_intercepts*2           # Mean stimulus depends on intercept
Simulate data
data = data.frame(subject = rep(1:n_subj, each=n_trials),
                  intercept = rep(subj_intercepts, each=n_trials),
                  slope = rep(subj_slopes, each=n_trials),
                  mx = rep(subj_mx, each=n_trials)) %>%
    mutate(
        x = rnorm(n(), mx, 1),
        y = intercept + (x-mx)*slope + rnorm(n(), 0, 1))
subject_means = data %>%
group_by(subject) %>%
summarise_if(is.numeric, mean)
subject_means %>% select(intercept, slope, x, y) %>% plot()
Plot
ggplot(data, aes(x, y, color=factor(subject))) +
    geom_point() +
    stat_smooth(method='lm', se=F) +
    stat_smooth(group=1, method='lm', color='black') +
    labs(x='Stimulus', y='Response', color='Subject') +
    theme_bw(base_size = 18)

Black line shows regression line collapsing across subjects. Coloured lines show individual subjects' regression lines. Note that slope is the same for all subjects --- apparent differences in the plot are due to noise.

# Model without random intercept
lm(y ~ x, data=data) %>% summary() %>% coef()
## Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -0.1851366 0.16722764 -1.107093 2.709636e-01
## x            0.2952649 0.05825209  5.068743 1.890403e-06
With random intercept
lmer(y ~ x + (1|subject), data=data) %>% summary() %>% coef()
Estimate Std. Error   t value
(Intercept) -1.4682938 1.20586337 -1.217629
x           -0.5740137 0.09277143 -6.187397

But covariate x can vary both within and between each subject. Thus, in your mixed-effects model, x coefficient is a combination of two different effects (within and between subjects). If not separated from each other, it is possible for x to actually lead to a Simpson's paradox. However, if one separates these effects, you always can very clearly distinguish between the two sources of effect EVEN IN OLS regression and prevent a Simpson's paradox. HERE is a resource. — Reza, Sep 09 '21 at 19:25
Yes. Simpson's paradox arises when within and between subject/group effects are not properly separated, and including random intercepts is a way of separating them. — Eoin, Sep 12 '21 at 09:16

score 2 · Answer 2 · answered Nov 01 '22 at 22:51

The accepted answer is absolutely correct but the degree of change in the fixed effect with the inclusion of the random effect can be very sensitive to the number of repeated measurements per subject. For instance, in my own analysis (data plotted below), the inclusion of a random intercept does not change the slope of the fixed effect (2 trials per subject). But by artificially enlarging the number of trial per subject by binding a clone of the observed data to itself (4 trials per subject), the effect described above is observed.

Can adding a random intercept change the fixed effect estimates in a regression model?

2 Answers2

Simulate data

subject_means = data %>%

group_by(subject) %>%

summarise_if(is.numeric, mean)

subject_means %>% select(intercept, slope, x, y) %>% plot()

Plot

With random intercept

Estimate Std. Error t value

(Intercept) -1.4682938 1.20586337 -1.217629

x -0.5740137 0.09277143 -6.187397

Linked