Why do parameter estimates change when a different reference is set in generalized linear models?

Question

Generating a binomial regression with a response as Survival and predictors Time, Life.Stage and Trial, output in r provides comparisons of each factor level for Life.Stage "B" , "C" and "D" and compares to the intercept estimated using "A" as a baseline.

Eg:

mod=glm((cbind(Alive, Dead))~Time+Life.Stage+Trial, data=, family=binomial(link="logit"))

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      4.8883     0.5361   9.118  < 2e-16 ***
Time            -1.5748     0.1886  -8.352  < 2e-16 ***
Life.StageD     10.9599     1.5906   6.891 5.56e-12 ***
Life.StageC      7.9772     1.3710   5.818 5.94e-09 ***
Life.StageB      5.2570     0.9619   5.465 4.63e-08 ***
factor(Trial)3   0.1628     0.4426   0.368    0.713    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 507.691  on 42  degrees of freedom
Residual deviance:  67.457  on 37  degrees of freedom
AIC: 104.44

Number of Fisher Scoring iterations: 7

Why, when I change the reference level to "B" or any other level, do the parameter estimates change?

I've referred to a similar question with an anova, however the explanation is a bit beyond my understanding.

The ultimate goal of my analysis is to determine whether Survival at each Life. Stage is different than every other Life.Stage.

Haitao Du · Accepted Answer · 2017-04-05T17:19:06.100

Mathematically:

Because the model matrix is different. The coefficients are calculated from the model matrix.

Specifically, Let $X$ to be the model matrix. We calculate $\beta$ by doing (assuming linear regression squared loss, BTW this is not a glm specific problem but more general). If $X$ changes, and $y$ is the same, the $\beta$ will change.

$$ \text{minimize} ~~\|X\beta-y\|^2 $$

Here is a simplified example: linear regression on mtcars, with cyl as categorical predictor. You can observe after relevel, the model matrix has been changed.

> mtcars$cyl=factor(mtcars$cyl)

> head(model.matrix(mpg~cyl,mtcars))
                  (Intercept) cyl6 cyl8
Mazda RX4                   1    1    0
Mazda RX4 Wag               1    1    0
Datsun 710                  1    0    0
Hornet 4 Drive              1    1    0
Hornet Sportabout           1    0    1
Valiant                     1    1    0

> mtcars$cyl=relevel(mtcars$cyl,2)

> head(model.matrix(mpg~cyl,mtcars))
                  (Intercept) cyl4 cyl8
Mazda RX4                   1    0    0
Mazda RX4 Wag               1    0    0
Datsun 710                  1    1    0
Hornet 4 Drive              1    0    0
Hornet Sportabout           1    0    1
Valiant                     1    0    0

Intuitively:

In addition, the coefficient change after relevel is also very intuitive. Because we are comparing with different base level. In the example above, without relevel we are comparing a car with 4 cylinder. After, we compare with a 6 cylinder car. If we change the reference, intuitively how much it impacts to MPG will be changed.

For example, I have a car with 8 cyl, if the coefficient describes how it compare to 4 cyl car, the coefficient will be different when we want to compare a 6 cyl car.

I think I understand this more simplified explanation, however a model matrix for my model would be a little more complicated considering there are multiple predictors. Essentially, β is estimated from the model matrix, but changing the matrix changes what information β is estimated from. Intuitively, this all makes sense because the estimate is effectively the difference between the reference level and the parameter or factor level estimated. (Thanks!) — hamilthj, Apr 05 '17 at 17:18
@hamilthj from the simple example, you can see how the model matrix change. Another related link is here changing encoding or reference level will affect the model matrix — Haitao Du, Apr 05 '17 at 17:21
One more abstract way of describing this is that the model itself only depends on a certain linear subspace, namely the column space defined by the design matrix. But the parameters, depend not only on this subspace, but on the basis for that subspace, given by the specific variables used, that is, the columns itself. Predictions from the model, for instance, will only depend on the linear subspace, not on the choosen basis. You should aølways ask yourself if what you computes is basis independent or not! — kjetil b halvorsen, Apr 05 '17 at 17:41

Why do parameter estimates change when a different reference is set in generalized linear models?

1 Answers1