2

I wanted to do simple calculations for sum of squares for linear regression using a very simple example.

1) Let's assume that we have 3 obervations [x,y]: [[1,1], [2,2], [3,3]].

2) I created simple regression model which for each x is assigning 3. So it is a flat line.

3) For that regression line I wanted to calculate: total sum of suares (TSS), residual sum of squares (RSS) and explained sum of squares (ESS).

enter image description here enter image description here enter image description here

As far as I know, TSS = RSS + ESS and here we have:

2 = 5 + 3

I can't figure out what I did wrong or TSS = RSS + ESS is aplicable only under certain conditions?

michalk
  • 155

2 Answers2

5

TSS = RSS + ESS only applies when the sum of the residuals (not the sum of the squared residuals, mind) is 0 (or equivalently, when the residuals have mean 0). In your example, all residuals are positive, and therefore their sum is also positive, and not 0, so the equality does not hold here.

Note that the sum and mean of the residuals are guaranteed to be 0 when the regression model includes an intercept, and has been fit to the data using ordinary least squares. This is typical in practice, and explains the relevance of the TSS = RSS + ESS formula.

(In your example, you could make the residuals have mean 0 by changing your model to predict a constant value of 2 rather than 3. If you retry your calculations with that model, you'll see the equality does hold in that case.)

Edit: In response to David88's follow-up question, I'd like to clarify that, as Scortchi also pointed out, the residuals having mean 0 (or summing to 0) is a necessary, but not a sufficient condition in order for the TSS = RSS + ESS equation to hold.

In particular, an arbitrary set of predicted values, that have the same mean als the data they are intended to predict, can easily be a terrible fit to that data, s.t. the RSS is actually larger (i.e. worse) than the TSS. That is, not only have we not "explained" any variance, but we have constructed a model that is worse than the data's own mean. Arguably, such a model "explains negative variance" - it only takes us further away from understanding or predicting the data.

As long as we fit our model to our data, and we use a reasonable fitting procedure, we will typically have RSS $\le$ TSS. However, this still isn't sufficient to guarantee that TSS = RSS + ESS. For instance, if we have data [-2, -1, 0, 1, 2] and predictions [-1, -0.5, 0, 0.5, 1], then TSS = 10, while RSS = ESS = 2.5, and so RSS + ESS = 5 $\neq$ 10.

For the equality to hold, we further require that the residuals are orthogonal to the predictions. That is, there is no more variance that the model could possibly explain. This condition is met when we fit a linear regression using the standard OLS equation, or if we use an iterative fitting procedure that converges to the global minimum.

0

I would like to continue the discussion since Ruben van Bergen explanation does not seem sufficient to me and I cannot figure out what I am doing wrong.

I tried it with a simple dataset: [x,y]: [1,2], [2,1], [3,4], [4,3].

The mean is 2,5. My regression line is x = y.

The model predictions are: [x,y]: [1,1], [2,2], [3,3], [4,4]

Sum of Errors is (2-1) + (1-2) + (4-3) + (3-4) = 1 - 1 + 1 - 1 = 0 as it is supposed to be.

And yet: TSS: (2-2,5)^2 + (1-2,5)^2 + (4-2,5)^2 + (3-2,5)^2 = 5 RSS: 1^2 + (-1)^2 + 1^2 + (-1)^2 = 4 ESS: (1-2,5)^2 + (2-2,5)^2 + (3-2,5)^2 + (4-2,5)^2 = 5

5 != 4 + 5

What am I doing wrong?

  • 1
    Your calculations are correct: the residuals' summing to nought is not a sufficient condition for the total sum of squares to be the sum of the regression & error sums of squares. See e.g. https://stats.stackexchange.com/a/319449/17230. (If you fit a OLS model linear in $x$ with intercept to these data you'll find the RSS is 1.8 & the ESS 3.2.) – Scortchi - Reinstate Monica Sep 15 '23 at 12:24
  • Oh, thank you, I will have to revisit all the presumption of linear regression. And sorry for posting here, next time I will ask a new question with reference to existing one... Have a nice day :-) – David88 Sep 19 '23 at 17:15