How does R lm() function calculate standard error of slopes with more than one predictor?

Question

I am confused as to why the standard error of slopes calculated by the R function 'lm()' differs from the following formula when there is more than one predictor:

$$ SE(\hat{\beta}_j) = \sqrt{\frac{\sum_{i=1}^{n}(y_{i} - \hat{y}_i)^2} {(n-2) \cdot \sum_{i=1}^{n}(x_{ij} - \bar{x}_j)^2 }} $$

When there is only one predictor, my manual calculation and the lm() function agree. However, as the number of predictors increases, the SE calculated by lm() becomes increasingly higher than the one produced by this formula, suggesting that R somehow factors in the number of predictors. I sometimes found the same formula where the term '(n-2)' was replaced by '(n-k)' where k is the number of estimated parameters (excluding the intercept). But even with this change, I cannot replicate the results from R.

I found that the way R calculates it corresponds to this code sqrt(diag(vcov(model))). But I struggle to understand it and don't understand why it would differ from the formula above.

Here is a reproducible example that compares the SE of the predictor 'x1' in a model where there are 10 predictors in total:

# generate data with one outcome 'y' and 10 predictors 'x1', 'x2',...
set.seed(1)
n = 100
df = data.frame(y = rnorm(n, 0, 1))
for (i in 1:10) df[[paste0('x',i)]] = rnorm(n, 0, 1)
run model
mod = lm(y ~ ., df)
get SE of x1 from the model
summary(mod)$coefficients   # SE of x1 = 0.09802912
sqrt(diag(vcov(mod)))  # alternatively, this gives the same result
manually calculate the SE of x1
sqrt( sum((df$y - mod$fitted.values)^2)  /  ((n-2) * sum((df$x1 - mean(df$x1))^2) ))
this yields a SE for x1 = 0.09093564

Would anyone know why the formula no longer seem to be valid when there is more than one predictor?

Thanks in advance for the help!

EDIT:

Thank you very much for the fast and precise answer. It all make a lot more sense to me now. So if I wanted to correct the formula, it would be:

$$ SE(\hat{\beta}_j) = \sqrt{\frac{VIF_j \cdot \sum_{i=1}^{n}(y_{i} - \hat{y}_i)^2} {(n-k) \cdot \sum_{i=1}^{n}(x_{ij} - \bar{x}_j)^2 }} $$ Where k is the number of estimators (including the intercept).

I can imagine this is not the way R goes about calculating it, but this way of representing it really helps me understand the factors influencing the SE. With the multicollinearity increasing the SE, and the ambiguous effect of adding predictors (reducing the numerator as the SSE decreases with additional predictor, but also the denominator decreasing with the n-k term).

If I break it down in R now (with the same model as above), I do find the same value!

SSE = sum((df$y - mod$fitted.values)^2)
vif = car::vif(mod)['x1']
SSx1 = sum((df$x1 - mean(df$x1))^2)
k = length(coef(mod))
sqrt( (SSE * vif) / ((n-k)*SSx1) )  # = 0.09802912

Thank you so much!

This is a XY problem (you got the wrong formula and are asking why it's wrong, instead of asking how to compute the standard errors directly). Basically, refer to this question instead: How are the standard errors of coefficients calculated in a regression? — Firebug, Aug 10 '23 at 15:13
Related questions about the $(n-p)$ in the denominator: Why are the residuals in $R^{n−p}$? and Why divide by n−2 for residual standard errors — Sextus Empiricus, Aug 11 '23 at 14:37
Related questions about $(X^TX)^{-1}$: Intuition behind $(X^TX)^{-1}$ in closed form of w in Linear Regression — Sextus Empiricus, Aug 11 '23 at 14:40

Christoph Hanck · Answer 1 · 2023-08-11T08:43:50.907

A slightly different angle on Dave's answer:

The standard errors reported are the square roots of the diagonal elements of the estimated variance-covariance matrix of the OLS estimator $$Var(\hat{\boldsymbol{\beta}})=\hat{\sigma}^2(X'X)^{-1}$$ In case of a regression with constant and single predictor, this formula amounts to what you report (see e.g. How to derive variance-covariance matrix of coefficients in linear regression or How are the standard errors of coefficients calculated in a regression?).

In regression with more predictors, the general formula also takes into account the covariance among the predictors.

An alternative way to replicate in R without the VIF:

p <- 4
n <- 100
X <- matrix(rnorm(n*p), ncol=p)
y <- rnorm(n)
reg <- summary(lm(y~X))
> reg$coefficients[, 2]
(Intercept)          X1          X2          X3          X4 
  0.1006251   0.1122252   0.1029474   0.1038964   0.1038916
> sqrt(sum(reg$residuals^2)/(n-p-1)*diag(solve(crossprod(cbind(1,X)))))
[1] 0.1006251 0.1122252 0.1029474 0.1038964 0.1038916

Dave · Accepted Answer · 2023-08-09T15:12:48.003

You’re missing two facets about the standard error for multiple regression.

In your formula, the $n-2$ comes from subtracting the number of regression parameters $(2)$ from the sample size $(n)$, and this count of parameters does include the intercept. In multiple regression, you have more than just two parameters, with the number often denoted as $p$ (or $k$ in your lingo). Thus, you divide by the analogous $n-p$. This relates to unbiased estimation of the error variance.
When you have multiple predictor variables, the standard errors expand beyond where they would be for perfectly independent predictors if there is multicollinearity. Thus, the standard error calculation must account for any possible multicollinearity. This is quantified by the variance inflation factor (VIF), which is related to the $R^2$ of the regression of all other features on the feature for which the coefficient standard error is being calculated.

Thus, your formula should divide by $n-p$ instead of $n-2$ and also should multiply by the variance inflation factor. Doing this will decrease the denominator, increasing the overall standard error (this is correct behavior), which is consistent with your observation of your equation giving a smaller value than the software printout.

The Wikipedia article on VIF has some additional good discussion.

Maybe worth noting that VIFs are not really used in the calculations. But they help to understand the difference. — Michael M, Aug 09 '23 at 11:29
@MichaelM Do you mean that the software explicitly calculates using matrix elements instead of calculating each VIF? Otherwise, I am not following. — Dave, Aug 09 '23 at 11:39
related: https://stats.stackexchange.com/questions/485676/meaning-of-standard-error-of-the-coefficients-in-a-regression-model/485680#485680 and the links provided there — Christoph Hanck, Aug 09 '23 at 11:42
Yes, it's calculated from the residual standard deviation and the estimated VC matrix of the coefficients. — Michael M, Aug 09 '23 at 12:16

How does R lm() function calculate standard error of slopes with more than one predictor?

run model

get SE of x1 from the model

manually calculate the SE of x1

this yields a SE for x1 = 0.09093564

2 Answers2

Linked

Related