1

Following the discussion :

Here's [a link] How are the standard errors of coefficients calculated in a regression?

I've got a question . Why variance of theta (regression coefficients) in theory is calculated by multiplying sigma^2 and (XT * X) ^ -1. Where sigma^2 is variance of the residuals : SSE / n-1 ,(where n- number of samples in the training set). However, in R it is computed as MSE = SSE / (n-p-1) (where p number of features) multiplied by the same matrix (XT * X) ^ -1. What is the reason? Many thanks in advance.

  • Did you mean to type beta instead of theta? – mdewey Oct 13 '17 at 13:00
  • Where did you read that "sigma^2 is variance of the residuals : SSE / n-1"? That's not what the linked thread asserts. – whuber Oct 13 '17 at 19:44
  • Yeah sorry theta = beta or in other words regression coefficients @mdewey. – Michael2016 Oct 18 '17 at 11:18
  • Yep @whuber when you derive maximum likelihood assuming that noise epsilon is normally distributed with variance sigma^2 and mean zero, I think that it means that residuals are distributed in that fashion. Am I wrong? since noise is itself residual – Michael2016 Oct 18 '17 at 11:21
  • It's not a maximum likelihood estimate: it's the least squares estimate. Ordinary Least Squares does not necessarily assume Normal distribution of the errors. – whuber Oct 18 '17 at 13:35

1 Answers1

0

When you calculate in theory, you are acting as if you know the true parameter values. When you fit to data, the fitting in your way of guessing what those parameter values truly are. Much of statistical inference is about how to put yourself in a position to consistently make good guesses, but you still need to make a guess and estimate those parameters.

The variance calculation with the $n-p-1$ denominator is an unbiased estimate of the true error variance. This is a desirable property for our estimate (guess) to have. COnversely, a variance calculation with an $n$ or $n-1$ denominator makes for a biased estimator. While there can be reasons to like biased estimators over unbiased estimators, all else equal, we like unbiased estimators (but that "all else equal" matters).

Dave
  • 62,186