3

I came across the adjusted $R^2$ for multivariate linear models, $R^{2}_{adjusted} = 1 - \frac{SSE / (n-p-1)}{SSTO / (n-1)}$, and I was curious what kinds of properties this satisfies. (Googling was not very helpful). I can tell that if we add additional predictors that don't help to explain the observations, then this quantity will strictly decrease, but are there other interesting properties this satisfies, or is just another somewhat arbitrary way of measuring fit?

Thanks.

Dave
  • 62,186
user49404
  • 457
  • It's definitely only one way. If you're in 2D, or maybe even in 3D, it's not a bad idea to plot your residuals. If the residuals look random, you likely have captured most of the features of interest, particularly if $R^2$ is high. – Adrian Keister Mar 10 '20 at 21:00
  • @AdrianKeister Why only in 2D or 3D? – Dave Mar 16 '20 at 12:15
  • @Dave: It's difficult to plot residuals as a function of, say, a 4D independent variable. At that point, you're dealing with 5 dimensions, and our graphing abilities are pretty much non-existent in those higher dimensions. Naturally, you can project down into lower dimensions, but you'll always lose information when you do that. – Adrian Keister Mar 16 '20 at 15:00
  • @AdrianKeister The residuals exist in the space of the response variable ($\hat{y}_i-y_i$), so I am not sure what you mean. – Dave Mar 16 '20 at 15:03
  • @Dave: True, but that's the $y$ axis when you plot. What will be your $x$ axis? – Adrian Keister Mar 16 '20 at 15:37

1 Answers1

4

It makes a lot of sense.

$R^2$ measures the ratio of residual variance (numerator) to the total variance (denominator), calculating each variance as if the observations were the whole population (literally applying the discrete $\mathbb E\left[\left(X-\mathbb E\left[X\right]\right)^2\right]$ formula to the residuals (numerator) and the pooled distribution of all $Y$ values (denominator)).

$$ R^2=1-\dfrac{ SSE/n }{ SSTotal/n }\\ SSE\text{: Sum of squared residuals (“errors”)}\\ SSTotal\text{: Total sum of squared deviations of } y\text{ values from }\bar y $$

However, these are biased estimates of the respective variances!

By dividing by $n-p-1$ in the numerator and $n-1$ in the denominator, we now have a ratio of unbiased estimates of the respective variances.

Dave
  • 62,186
  • 2
    (+1, I mean that's the correct answer I don't get it why 0 votes. Maybe a reference to the "residuals degrees of freedom" is missing if we are somewhat picky but still...) – usεr11852 Nov 12 '22 at 20:43