I am confused as to why the standard error of slopes calculated by the R function 'lm()' differs from the following formula when there is more than one predictor:
$$ SE(\hat{\beta}_j) = \sqrt{\frac{\sum_{i=1}^{n}(y_{i} - \hat{y}_i)^2} {(n-2) \cdot \sum_{i=1}^{n}(x_{ij} - \bar{x}_j)^2 }} $$
When there is only one predictor, my manual calculation and the lm() function agree. However, as the number of predictors increases, the SE calculated by lm() becomes increasingly higher than the one produced by this formula, suggesting that R somehow factors in the number of predictors. I sometimes found the same formula where the term '(n-2)' was replaced by '(n-k)' where k is the number of estimated parameters (excluding the intercept). But even with this change, I cannot replicate the results from R.
I found that the way R calculates it corresponds to this code sqrt(diag(vcov(model))). But I struggle to understand it and don't understand why it would differ from the formula above.
Here is a reproducible example that compares the SE of the predictor 'x1' in a model where there are 10 predictors in total:
# generate data with one outcome 'y' and 10 predictors 'x1', 'x2',...
set.seed(1)
n = 100
df = data.frame(y = rnorm(n, 0, 1))
for (i in 1:10) df[[paste0('x',i)]] = rnorm(n, 0, 1)
run model
mod = lm(y ~ ., df)
get SE of x1 from the model
summary(mod)$coefficients # SE of x1 = 0.09802912
sqrt(diag(vcov(mod))) # alternatively, this gives the same result
manually calculate the SE of x1
sqrt( sum((df$y - mod$fitted.values)^2) / ((n-2) * sum((df$x1 - mean(df$x1))^2) ))
this yields a SE for x1 = 0.09093564
Would anyone know why the formula no longer seem to be valid when there is more than one predictor?
Thanks in advance for the help!
EDIT:
Thank you very much for the fast and precise answer. It all make a lot more sense to me now. So if I wanted to correct the formula, it would be:
$$ SE(\hat{\beta}_j) = \sqrt{\frac{VIF_j \cdot \sum_{i=1}^{n}(y_{i} - \hat{y}_i)^2} {(n-k) \cdot \sum_{i=1}^{n}(x_{ij} - \bar{x}_j)^2 }} $$ Where k is the number of estimators (including the intercept).
I can imagine this is not the way R goes about calculating it, but this way of representing it really helps me understand the factors influencing the SE. With the multicollinearity increasing the SE, and the ambiguous effect of adding predictors (reducing the numerator as the SSE decreases with additional predictor, but also the denominator decreasing with the n-k term).
If I break it down in R now (with the same model as above), I do find the same value!
SSE = sum((df$y - mod$fitted.values)^2)
vif = car::vif(mod)['x1']
SSx1 = sum((df$x1 - mean(df$x1))^2)
k = length(coef(mod))
sqrt( (SSE * vif) / ((n-k)*SSx1) ) # = 0.09802912
Thank you so much!