2

As I understand it, in the specific context of linear regression, the R output "residual standard error" is an estimate of $\sigma$, the standard deviation of the distribution of the residuals. You can compute $\sigma^2$ by the MSE, which in general is the mean of the squared errors, but in regression the denominator is the residual degrees of freedom $n-p$ ($p$ representing the number of parameters including the intercept).

$$MSE=\frac{\sum_{i=1}^{n} y_i - \hat{y}_i}{n-p}$$

Why isn't residual standard error just called RMSE, when $\sigma^2$ is the MSE? Or is $\hat{\sigma}^2$ often called the residual variance (making the terms for $\sigma^2$ and $\sigma$ line up)?

Dave
  • 62,186
fmtcs
  • 525
  • I think this is widely regarded as a poor decision by the R developers. I’ve definitely seen discussion about it on here but can’t think of where. – Dave Mar 09 '23 at 19:38
  • @Dave how would you call $\hat{\sigma}$ and $\hat{\sigma}^2$? – fmtcs Mar 09 '23 at 19:47
  • Unbiased error variance estimate and the square root of the unbiased error variance estimate. (I might use “unbiased standard deviation” as slang, even though Jensen’s inequality shows such a term to be wrong (square root of an unbiased estimator is biased).) – Dave Mar 09 '23 at 19:50
  • @Dave do you think calling $\hat{\sigma}^2$ the MSE is a mistake as well? – fmtcs Mar 09 '23 at 19:53
  • 1
    Referring to “the” MSE is probably a mistake, since there are reasonable arguments for multiple calculations (an $n$ denominator and and $n-p$ denominator both make sense). I would want to define explicitly what I mean if there is any ambiguity about it. However, the R decision to call $\hat\sigma$ the standard error makes no sense to me, because standard errors are associated with parameters being estimated. To which parameter does $\hat\sigma$ correspond? (I don’t have an answer, which is why I have yet to see why R calls $\sigma$ a residual standard error.) – Dave Mar 09 '23 at 19:57

1 Answers1

1

Referring to “the” MSE is probably a mistake, since there are reasonable arguments for multiple calculations (an $n$ denominator and and $n−p$ denominator both make sense). I would want to define explicitly what I mean if there is any ambiguity about it.

However, the R decision to call $\hat\sigma$ the standard error makes no sense to me, because standard errors are associated with parameters being estimated. To which parameter does $\hat\sigma$ correspond? (I don’t have an answer, which is why I have yet to see why R calls $\hat\sigma$ a residual standard error.)

I do not really have a clean name for $\hat\sigma=\sqrt{ \frac{ \sum\left( y_i-\hat y_i \right)^2 }{ n-p} }$. While the expression inside the square root is unbiased for error variance (assuming fairly typical assumptions like the Gauss-Markov conditions), Jensen’s inequality means that $\hat\sigma$ is biased for the error standard deviation, so “unbiased error standard deviation” is not correct.

EDIT

The R developers appear to know that residual standard "error" is a misnomer and lament that it crept into the documentation for functions to such an extent that correcting this becomes difficult.

Dave
  • 62,186
  • 1
    I recall seeing others on here lamenting this naming decision but cannot think offhand of where to find such posts. I welcome others to post links in the comments. – Dave Mar 18 '23 at 13:16