ANOVA: RSS > TSS?

Question

I am sure that this is a simple question, but I can't figure out what my mistake is. I am doing an ANOVA project using car::Anova. I presumed that the TSS would be the sum of sum of squares, i.e. TSS = ESS + RSS, but then figured that this does not seem to be the case.

Then I thought that this could be due to the intercept, but that actually made things worse, as now I have an RSS > TSS (which should be impossible, right?)

Below is a reproducible example:

library(car)
library(dplyr)
library(broom)
y = rnorm(100, 10, 10)
x1 = rnorm(100, 10, 10)
x2 = rnorm(100, -2, 4)

Part 1: TSS when using Anova is not TSS when calculating by hand?

# TSS = 10735.8
var(y) * 99
# TSS = 10743.96
lm(y ~ x1 + x2) %>% Anova(type = 2) %>% tidy() %>% pull(sumsq) %>% sum()

Part 2: TSS < RSS

# RSS = 15650.71 (> TSS?)
lm(y ~ x1 - 1) %>% summary() %>% .$residuals %>% .^2 %>% sum()

Thanks in advance!

dipetkov · Answer 1 · 2022-07-02T11:18:01.647

You've made a statistical mistake: You want to use ANOVA type I instead of ANOVA type II to decompose the total sum of squares (TSS) into the explained sum of squares (ESS) and the residual sum of squares (RSS).

ANOVA type I: Use x1 to predict y, then adjust x2 for x1 and use the remainder to predict y.
ANOVA type II: Use x1 adjusted for x2 to predict y; use x2 adjusted for x1 to predict y.

What do we mean by adjusting a variable x2 for another variable x1? For intuition, take height and weight: taller people (adults) weigh more than smaller people (children), so if we know height we have (partial) information about weight. That is, we can predict weight given height because the two traits are correlated. To adjust weight for height, we subtract the component of weight that's explained by height; the remainder is "adjusted". In R syntax we can do this with residuals(lm(weight ~ height)).

Separately from the confusion about ANOVA, you make the wrong assumption that RSS ≤ TSS always holds. However, a model can be worse than predicting the overall mean (think about predicting values at random); then the residual sum of squares can be greater than the total sum of squares.

A correctly fitted regression won't have this strange behavior of course.

set.seed(1234)
n <- 100
y <- rnorm(n, 10, 10)
x1 <- rnorm(n, 10, 10)
x2 <- rnorm(n, -2, 4)
Total sum of squares
sum((y - mean(y))^2)
#> [1] 9987.417

ANOVA Type I: This is the ANOVA to use if you want to decompose the total sum of squares.

anova(lm(y ~ x1 + x2))
#> Analysis of Variance Table
#> 
#> Response: y
#>           Df Sum Sq Mean Sq F value Pr(>F)
#> x1         1    6.4   6.435  0.0630 0.8024
#> x2         1   71.4  71.373  0.6986 0.4053
#> Residuals 97 9909.6 102.161

You can verify that the Sum Sq column sums up to the total sum of squares TSS.

ANOVA Type II: Not the ANOVA you have in mind.

car::Anova(lm(y ~ x1 + x2))
#> Anova Table (Type II tests)
#> 
#> Response: y
#>           Sum Sq Df F value Pr(>F)
#> x1          11.0  1  0.1072 0.7440
#> x2          71.4  1  0.6986 0.4053
#> Residuals 9909.6 97

Both ANOVAs have the same residual sum of squares in the bottom row. But the sum of squares in the first row is different: it's the Expected SS for the model y ~ x1 in the ANOVA type I table and the Expected SS for the model y ~ residuals(x1 ~ x2) in the ANOVA type II table.

# Predict y given both x1 and x2
yhat <- predict(lm(y ~ x1 + x2))
# Residual sum of squares
sum((y - yhat)^2)
#> [1] 9909.61

A bad model which always predicts 10 has larger residual sum of squares than the TSS.

yhat <- rep(10, n)
# Residual sum of squares
sum((y - yhat)^2)
#> [1] 10233.16

Appendix

There are three types of ANOVA. To learn more about them, see How to interpret type I, type II, and type III ANOVA and MANOVA?

Explicit ANOVA type II calculations by adjusting x1 for x2 and x2 for x1:

car::Anova(lm(y ~ x1 + x2))
#> Anova Table (Type II tests)
#> 
#> Response: y
#>           Sum Sq Df F value Pr(>F)
#> x1          11.0  1  0.1072 0.7440
#> x2          71.4  1  0.6986 0.4053
#> Residuals 9909.6 97
anova(lm(y ~ residuals(lm(x1 ~ x2))))[1, ]
#> Analysis of Variance Table
#> 
#> Response: y
#>                        Df Sum Sq Mean Sq F value Pr(>F)
#> residuals(lm(x1 ~ x2))  1 10.955  10.955  0.1076 0.7436
anova(lm(y ~ residuals(lm(x2 ~ x1))))[1, ]
#> Analysis of Variance Table
#> 
#> Response: y
#>                        Df Sum Sq Mean Sq F value Pr(>F)
#> residuals(lm(x2 ~ x1))  1 71.373  71.373  0.7054  0.403

ANOVA: RSS > TSS?

1 Answers1

Total sum of squares