Why does removing intercept not change predicition of linear model in the precence of factor predictors?

Question

In a linear model that predicts birth rate (TFR) per country from per capita GDP, the country is encoded in "treatment coding", and there are several measurements (different years) per country. I would thus have thought that the first level represents the "reference intercept" and the predictions for only this first level should change when the intercept is removed from the model.

However, the predictions do not change for any country, if I remove the intercept:

> fit1 <- lm(TFR ~ logGDPpc + logGDPpc2 + 
                    country, data=x)
> fit2 <- lm(TFR ~ logGDPpc + logGDPpc2 + 
                    country - 1, data=x)
> max(abs(fit1$fitted.values - 
          fit2$fitted.values))
[1] 1.847411e-13

This also applies to the relative error of the differences:

> max(abs((fit1$fitted.values - 
   fit2$fitted.values)/fit2$fitted.values))
[1] 7.482906e-14

Is this the expected behavior? Why?

Computers have limited precision. A maximum difference of less than 2 / 10^13 is pretty small and can be considered negligible. — Russ Lenth, Mar 29 '22 at 13:27
Yes, that is my point: there is no difference, even for the reference level. Why? — cdalitz, Mar 29 '22 at 13:28
R is not tricked by the attempt to remove the intercept. If you remove the intercept, then R adds a dummy for the reference level. Instead of looking at the fitted values, it might be easier to look at the design matrix with the model.matrix function. These two models have the same design matrix. — dipetkov, Mar 29 '22 at 13:38
@dipetkov Just looked at the model.matrix, and this is indeed the explanation. Would you mind elaborating your comment into an answer whcih I then can accept, so that teh question is marked as answered? — cdalitz, Mar 29 '22 at 13:42
I posted it as an answer because I made a mistake in the comment: it's an equivalent matrix, not the same matrix. Just reinforces the advice to always look at the design matrix. — dipetkov, Mar 29 '22 at 14:15
This is asked&answered before: https://stats.stackexchange.com/questions/416007/how-to-interpret-regression-function-with-categorical-variable with more info at https://stats.stackexchange.com/questions/130643/how-can-logistic-regression-have-a-factorial-predictor-and-no-intercept/130793#130793 and https://stats.stackexchange.com/questions/215779/removing-intercept-from-glm-for-multiple-factorial-predictors-only-works-for-fir/218034#218034 — kjetil b halvorsen, Mar 29 '22 at 14:56

Russ Lenth · Answer 1 · 2022-03-29T14:16:49.987

These are two parameterizations of the same model. Look at the regression coefficients for each model, and you will see the same number of parameters.

In the first model, the model intercept is the $y$ intercept for the first country, and the coefficients for the other countries are differences between that country's intercept and the first country's.

In the second model, the absence of the overall intercept causes R to create indicators for all of the countries. The regression coefficients for the countries will be their respective $y$ intercepts; so the first one's coefficient will equal the intercept of the first model.

dipetkov · Accepted Answer · 2022-03-29T14:19:37.200

As @Russ Lenth points out these models have equivalent parametrizations.

Usually (in R) we specify models with a formula such as y ~ x1 + x2. It's very convenient. Under the hood, R uses the formula and the data to come up with the design matrix.

It's often helpful to look at the design matrix to figure out how R processed the inputs, esp. if the formula includes categorical variables, polynomials or other variable transformations. Use the model.matrix function to construct the design matrix explicitly.

sample_size <- 10
n_levels <- 3
x_cat <- factor(sample(1:n_levels, sample_size, replace = TRUE))
x_num <- rnorm(sample_size)
We don't need a response to construct the design matrix.
The two formulas correspond to equivalent design matrices.
->
They are the same model and the fitted values are the same,
up to some negligible numerical differences.
The design matrix with an intercept doesn't have a dummy variable for the reference level.
model.matrix(~ x_num + x_cat)
#>    (Intercept)      x_num x_cat2 x_cat3
#> 1            1 -0.9633999      0      1
#> 2            1  0.9592475      0      0
#> 3            1 -0.9279922      1      0
#> 4            1 -0.2097351      1      0
#> 5            1 -0.5812370      1      0
#> 6            1  0.6245961      0      1
#> 7            1 -0.9484379      0      1
#> 8            1 -0.8772716      0      1
#> 9            1  0.8568915      0      1
#> 10           1  1.6237805      0      0
#> attr(,"assign")
#> [1] 0 1 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$x_cat
#> [1] "contr.treatment"
The design matrix with an intercept has a dummy variable for the reference level.
model.matrix(~ x_num + x_cat - 1)
#>         x_num x_cat1 x_cat2 x_cat3
#> 1  -0.9633999      0      0      1
#> 2   0.9592475      1      0      0
#> 3  -0.9279922      0      1      0
#> 4  -0.2097351      0      1      0
#> 5  -0.5812370      0      1      0
#> 6   0.6245961      0      0      1
#> 7  -0.9484379      0      0      1
#> 8  -0.8772716      0      0      1
#> 9   0.8568915      0      0      1
#> 10  1.6237805      1      0      0
#> attr(,"assign")
#> [1] 1 2 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$x_cat
#> [1] "contr.treatment"

Why does removing intercept not change predicition of linear model in the precence of factor predictors?

2 Answers2

We don't need a response to construct the design matrix.

The two formulas correspond to equivalent design matrices.

->

They are the same model and the fitted values are the same,

up to some negligible numerical differences.

The design matrix with an intercept doesn't have a dummy variable for the reference level.

The design matrix with an intercept has a dummy variable for the reference level.