What is the difference between standard errors and residuals in OLS?

Question

I'm trying to get a deeper understanding of how OLS works. One thing that I thought I understood is the difference between standard errors and residuals.

Here are two definitions

Standard errors: The average distance that the observed values fall from the regression line.
Residuals: The difference between the actual value and the value predicted by the model ($y_i - \hat y_i$) for any given point.

Where I always assumed that number 2 was unobservable (Actually in this post they claim it's the other way around: https://stats.stackexchange.com/a/232588/334202).

But if I run a simple regression in R like this I get both! So how can I think about this?

library(tidyverse)
library(broom)
mtsmall <- mtcars |>
  rownames_to_column(var="carnames") |> 
  as_tibble() |> 
  select(mpg,hp,wt)
model1 <- lm(mpg ~ hp, mtsmall)
mtsmall_predicted <- augment_columns(model1, mtsmall) |> 
  rename(.mpg_hat = .fitted)
mtsmall_predicted |> head(5)

Output:

    mpg    hp    wt .mpg_…¹ .se.fit .resid   .hat .sigma .cooksd
  <dbl> <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
1  21     110  2.62    22.6   0.777 -1.59  0.0405   3.92 3.74e-3
2  21     110  2.88    22.6   0.777 -1.59  0.0405   3.92 3.74e-3
3  22.8    93  2.32    23.8   0.873 -0.954 0.0510   3.92 1.73e-3
4  21.4   110  3.22    22.6   0.777 -1.19  0.0405   3.92 2.10e-3
5  18.7   175  3.44    18.2   0.741  0.541 0.0368   3.93 3.89e-4

"Actually in this post they claim it's the other way around" That other post has it right. Where did you get your definition for 'residual'? — Sextus Empiricus, Feb 02 '23 at 08:37
From Stock &v Watson 5th ed. p. 149 (I just added the "i" subscript that they use in the book, but I don't think it makes a difference?). — Tomas R, Feb 02 '23 at 08:42
When they say "the actual value" it means "the actual observed value" — Sextus Empiricus, Feb 02 '23 at 08:44
"the value predicted by the model" that is not a prediction for $y$ but an estimate of $E[y]$, the mean/expectation of the population of $y$'s. Residuals are the difference between the actual observed $y$ and the estimated $\hat{E(y)}$. Errors are the difference between the actual observed $y$ and the true population value $E(y)$ (that true value is not observed). — Sextus Empiricus, Feb 02 '23 at 08:47

Sextus Empiricus · Answer 1 · 2023-02-02T08:34:01.780

1

You have errors and residuals, which are different and probably what you understood.

You seem to be speaking about 'standard error'. That is an indirect quantification of the 'error'. The standard error quantifies it in terms of an estimate for the standard deviation of the (estimated) sample distribution of that value.

edited Feb 02 '23 at 08:34

answered Feb 02 '23 at 08:28

Sextus Empiricus

77,915

So let me see if I got this right. Residual is the difference between the predicted value and the observed value (calculated from the data). Then the error is the difference from the mean in the population (most likely unobserved). Finally, the std. error is the estimation of that unobserved error? If this is true, why is not the residual then equal to the standard error? Many thanks! – Tomas R Feb 02 '23 at 08:47
1

@TomasR residuals and errors are varying. If you roll a 6 sided dice, then the difference between the actual dice roll and the mean 3.5, will be varying and 0.5, 1.5 or 2.5. The mean (root mean square more precisely) of those possible error values is the standard error. The standard error is estimated based on multiple residuals, that's why residuals may be different from the standard error. – Sextus Empiricus Feb 02 '23 at 08:56
1

"std. error is the standard deviation of the sample distribution of the error" and in practice it is an estimation. – Sextus Empiricus Feb 02 '23 at 08:59

What is the difference between standard errors and residuals in OLS?

1 Answers1