You've made a statistical mistake: You want to use ANOVA type I instead of ANOVA type II to decompose the total sum of squares (TSS) into the explained sum of squares (ESS) and the residual sum of squares (RSS).
- ANOVA type I: Use x1 to predict y, then adjust x2 for x1 and use the remainder to predict y.
- ANOVA type II: Use x1 adjusted for x2 to predict y; use x2 adjusted for x1 to predict y.
What do we mean by adjusting a variable x2 for another variable x1? For intuition, take height and weight: taller people (adults) weigh more than smaller people (children), so if we know height we have (partial) information about weight. That is, we can predict weight given height because the two traits are correlated. To adjust weight for height, we subtract the component of weight that's explained by height; the remainder is "adjusted". In R syntax we can do this with residuals(lm(weight ~ height)).
Separately from the confusion about ANOVA, you make the wrong assumption that RSS ≤ TSS always holds. However, a model can be worse than predicting the overall mean (think about predicting values at random); then the residual sum of squares can be greater than the total sum of squares.
A correctly fitted regression won't have this strange behavior of course.
set.seed(1234)
n <- 100
y <- rnorm(n, 10, 10)
x1 <- rnorm(n, 10, 10)
x2 <- rnorm(n, -2, 4)
Total sum of squares
sum((y - mean(y))^2)
#> [1] 9987.417
ANOVA Type I: This is the ANOVA to use if you want to decompose the total sum of squares.
anova(lm(y ~ x1 + x2))
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> x1 1 6.4 6.435 0.0630 0.8024
#> x2 1 71.4 71.373 0.6986 0.4053
#> Residuals 97 9909.6 102.161
You can verify that the Sum Sq column sums up to the total sum of squares TSS.
ANOVA Type II: Not the ANOVA you have in mind.
car::Anova(lm(y ~ x1 + x2))
#> Anova Table (Type II tests)
#>
#> Response: y
#> Sum Sq Df F value Pr(>F)
#> x1 11.0 1 0.1072 0.7440
#> x2 71.4 1 0.6986 0.4053
#> Residuals 9909.6 97
Both ANOVAs have the same residual sum of squares in the bottom row. But the sum of squares in the first row is different: it's the Expected SS for the model y ~ x1 in the ANOVA type I table and the Expected SS for the model y ~ residuals(x1 ~ x2) in the ANOVA type II table.
# Predict y given both x1 and x2
yhat <- predict(lm(y ~ x1 + x2))
# Residual sum of squares
sum((y - yhat)^2)
#> [1] 9909.61
A bad model which always predicts 10 has larger residual sum of squares than the TSS.
yhat <- rep(10, n)
# Residual sum of squares
sum((y - yhat)^2)
#> [1] 10233.16
Appendix
There are three types of ANOVA. To learn more about them, see How to interpret type I, type II, and type III ANOVA and MANOVA?
Explicit ANOVA type II calculations by adjusting x1 for x2 and x2 for x1:
car::Anova(lm(y ~ x1 + x2))
#> Anova Table (Type II tests)
#>
#> Response: y
#> Sum Sq Df F value Pr(>F)
#> x1 11.0 1 0.1072 0.7440
#> x2 71.4 1 0.6986 0.4053
#> Residuals 9909.6 97
anova(lm(y ~ residuals(lm(x1 ~ x2))))[1, ]
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> residuals(lm(x1 ~ x2)) 1 10.955 10.955 0.1076 0.7436
anova(lm(y ~ residuals(lm(x2 ~ x1))))[1, ]
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> residuals(lm(x2 ~ x1)) 1 71.373 71.373 0.7054 0.403