5

Suppose I have one binary predictor $x$ and I fit an OLS model:

$$y = \alpha + \beta I(x=1) + \epsilon$$

Alternatively I can fit the following model

$$y = \beta I(x=1) + \beta I(x=0) + \epsilon$$

Why does the second model give me a higher R-Squared?

R example:

results <- numeric(1000)
for(i in 1:1000) {
  X <- rnorm(1000)
  b <- X > 0

y <- rep(0, 1000) y[b == TRUE] <- 1

y <- y + rnorm(1000)

rsq0 <- summary(lm(y ~ b))$r.squared rsq1 <- summary(lm(y ~ b - 1))$r.squared

results[i] <- rsq1 - rsq0 }

mean(results > 0) # This equals 1

badmax
  • 2,211

1 Answers1

2

As Roland writes, it comes down to the definition of $R^2$ in models with vs. without an intercept. More specifically, the difference lies in the calculation of the mean sum of squares.

We can inspect the code for summary() by simply typing summary.lm (without ()). Here are the relevant parts:

> summary.lm
function (object, correlation = FALSE, symbolic.cor = FALSE, 
    ...) 
{
    z <- object
    ...
    r <- z$residuals
    f <- z$fitted.values
    ...
        mss <- if (attr(z$terms, "intercept")) 
            sum((f - mean(f))^2)
        else sum(f^2)
        rss <- sum(r^2)
    ...
    ans <- z[c("call", "terms", if (!is.null(z$weights)) "weights")]
    ...
        ans$r.squared <- mss/(mss + rss)
ans

}

The crucial point is the distinction on attr(z$terms, "intercept")), which is evaluated to 1 (converted to TRUE) in your first model, but 0 (or FALSE) in the second. In the first case, mss is calculated as the sum of the squared differences between fits and the overall mean, and in the second case, it is just the sum of the squared fitted values. We thus get different values for mss. The rest of the calculation is identical.

We can calculate an example and reconstruct the different $R^2$s by hand:

> nn <- 10
> set.seed(1)
> 
> X <- rnorm(nn)
> b <- X > 0
> y <- rep(0, nn)
> y[b == TRUE] <- 1
> y <- y + rnorm(nn)
>   
> model0 <- lm(y ~ b)
> model1 <- lm(y ~ b - 1)
> 
> mss0 <- sum((model0$fitted.values-mean(model0$fitted.values))^2)
> mss1 <- sum(model0$fitted.values^2)
> rss0 <- sum(model0$residuals^2)
> rss1 <- sum(model1$residuals^2)
> mss0/(mss0 + rss0)
[1] 0.1350045
> (rsq0 <- summary(model0)$r.squared)
[1] 0.1350045
> mss1/(mss1 + rss1)
[1] 0.4628321
> (rsq1 <- summary(model1)$r.squared)
[1] 0.4628321
Stephan Kolassa
  • 123,354
  • That was a great answer by Stephan but, to be more succinct, it's really not correct to calculate $R^2$ in a no-intercept model because the sum of squares decomposition no longer holds. – mlofton Sep 12 '20 at 13:32