Why does R-Squared change with a no-intercept model?

Question

Suppose I have one binary predictor $x$ and I fit an OLS model:

$$y = \alpha + \beta I(x=1) + \epsilon$$

Alternatively I can fit the following model

$$y = \beta I(x=1) + \beta I(x=0) + \epsilon$$

Why does the second model give me a higher R-Squared?

R example:

results <- numeric(1000)
for(i in 1:1000) {
  X <- rnorm(1000)
  b <- X > 0
y <- rep(0, 1000)
  y[b == TRUE] <- 1
y <- y + rnorm(1000)
rsq0 <- summary(lm(y ~ b))$r.squared
  rsq1 <- summary(lm(y ~ b - 1))$r.squared
results[i] <- rsq1 - rsq0
}
mean(results > 0) # This equals 1

plot the best fit line over the data scatter and see what difference it makes. You change the model by removing a component of it, so you expect its fit to change. — ReneBt, Sep 05 '20 at 04:46
You do know that the definition of R² is different for models with and without intercept? — Roland, Sep 07 '20 at 06:24

Stephan Kolassa · Accepted Answer · 2020-09-12T12:40:33.350

As Roland writes, it comes down to the definition of $R^2$ in models with vs. without an intercept. More specifically, the difference lies in the calculation of the mean sum of squares.

We can inspect the code for summary() by simply typing summary.lm (without ()). Here are the relevant parts:

> summary.lm
function (object, correlation = FALSE, symbolic.cor = FALSE, 
    ...) 
{
    z <- object
    ...
    r <- z$residuals
    f <- z$fitted.values
    ...
        mss <- if (attr(z$terms, "intercept")) 
            sum((f - mean(f))^2)
        else sum(f^2)
        rss <- sum(r^2)
    ...
    ans <- z[c("call", "terms", if (!is.null(z$weights)) "weights")]
    ...
        ans$r.squared <- mss/(mss + rss)
ans

}

The crucial point is the distinction on attr(z$terms, "intercept")), which is evaluated to 1 (converted to TRUE) in your first model, but 0 (or FALSE) in the second. In the first case, mss is calculated as the sum of the squared differences between fits and the overall mean, and in the second case, it is just the sum of the squared fitted values. We thus get different values for mss. The rest of the calculation is identical.

We can calculate an example and reconstruct the different $R^2$s by hand:

> nn <- 10
> set.seed(1)
> 
> X <- rnorm(nn)
> b <- X > 0
> y <- rep(0, nn)
> y[b == TRUE] <- 1
> y <- y + rnorm(nn)
>   
> model0 <- lm(y ~ b)
> model1 <- lm(y ~ b - 1)
> 
> mss0 <- sum((model0$fitted.values-mean(model0$fitted.values))^2)
> mss1 <- sum(model0$fitted.values^2)
> rss0 <- sum(model0$residuals^2)
> rss1 <- sum(model1$residuals^2)
> mss0/(mss0 + rss0)
[1] 0.1350045
> (rsq0 <- summary(model0)$r.squared)
[1] 0.1350045
> mss1/(mss1 + rss1)
[1] 0.4628321
> (rsq1 <- summary(model1)$r.squared)
[1] 0.4628321

That was a great answer by Stephan but, to be more succinct, it's really not correct to calculate $R^2$ in a no-intercept model because the sum of squares decomposition no longer holds. — mlofton, Sep 12 '20 at 13:32

Why does R-Squared change with a no-intercept model?

1 Answers1