There is a statement that got my attention recently, which is "ANOVA is just linear regression".
I was watching this video that seemed to explain the relationship between the two topics.
At one point the teacher explains the SSE breakdown between $\text{SSE}_{Reg}$ (i.e. the variability explained by the model) and $\text{SSE}_{Reg}$ (i.e. the variability explained by the residuals), and explains that both quantities follow a chi-squared distribution, the former with $1$ degrees of freedom and the latter with $n-2$. I want to fully understand why.
I have seen other resources and everyone states the degrees of freedom, but nobody really explains why. I have some ideas, but I am not really sure and I would like to validate them here.
So, using the same notation in the video:
- $\hat{y_i}$ is the model prediction, following the equation $\hat{y_i} = b_0 + b_1x_i$
- $\bar{y}$ is the average response variable, looking at data
- $y_i$ is one data point in the original data
Now:
$\text{SSE}_{Reg}$
- Formula: $\sum_{i=1}^n (\hat{y_i} - \bar{y})^2$
- DFs: $1$
- Reasoning: In this case $\bar{y}$ is a constant, so it doesn't count towards the degrees of freedom. The only variable parameter is $\hat{y_i}$ which in turn comes from $\hat{y_i} = b_0 + b_1x_i$.
Does this expression have 1 degrees of freedom because we are assuming the null hypothesis $H_0$ which is $b_1 = 0$? That is, under the null, $\hat{y_i} = b_0$ and $b_0$ is the degree of freedom everyone is talking about?
$\text{SSE}_{Res}$
- Formula: $\sum_{i=1}^n (\hat{y_i} - {y_i})^2$
- DFs: $n-2$
- Reasoning: In this case we have all the $y_i$ which can vary, and give us a total of $n$. Is it correct to state that in general if I have a quantity and $p$ variables I actually only need $p-1$ of them? So in this case I need $n - 1$? Still, it is not $n-2$...
On the other hand we have \hat{y_i} which had $1$ df (for what we said above). How do we get to $n-2$? Is it because if I consider that \hat{y_i} is already estimated then its df is subtracted, getting to $n-2$?
Can anyone help me on this?