-1

In maximum likleihood, we believe that the y-variable is conditionally normally distributed. So this means that errors are also normally distributed.

In ols regression, things seem to be more algebra/geometry driven. I am trying to fit a line of best fit between some points. I have done this in high school in my vector algebra class ... and there was never any mention of normal distribution in fitting a line of best fit.

So how come the errors in OLS are needed to be normally distributed?

Dave
  • 62,186
stats_noob
  • 1
  • 3
  • 32
  • 105
  • 1
    They are not (If by OLS you mean optimising a line to minimize the sum of squared residuals) – Firebug Sep 26 '23 at 18:40
  • 2
    Hi: They only need to be normally distributed if you want to do inference and hypothesis testing. You can minimize the sum of the squared errors without the errors being normally distributed and still get a line of best fit. – mlofton Sep 26 '23 at 18:42
  • 1
    Notice that, in the MLE framework, you are maximizing a likelihood so you need a likelihood function. The assumption of normally distributed errors results in the likelihood function. So, the two frameworks give the same coefficient estimates but under different assumptions. – mlofton Sep 26 '23 at 18:44
  • thx! just to confirm ... ols does not need normal distribution errors? – stats_noob Sep 26 '23 at 19:17
  • can you explain why they need to be normal for hypothesis test and inferences? – stats_noob Sep 26 '23 at 19:17
  • The duplicate asks a same question in a negated form. "Why are errors normal in OLS?" versus "Why aren't errors normal when it is not in OLS?". – Sextus Empiricus Sep 26 '23 at 19:19
  • Similar questions/answers exist as well about the issue whether the normal distributed errors are needed for OLS to be used. Answer: normal distributed errors are not needed. So the premise in the question is not right. (example: https://stats.stackexchange.com/questions/509924/) – Sextus Empiricus Sep 26 '23 at 19:21
  • No one has quite said this yet, but maximum likelihood is about maximizing whatever likelihood function you postulate. Normal errors are just one example. So the first paragraph of the question confuses a particular if often applied example with a much more general idea. – Nick Cox Sep 26 '23 at 20:28
  • @jwolof the usual tests derive the distribution of test statistics under H0 for an assumption of iid normal errors (plus some additional assumptions). This gives exact significance levels. In large samples, significance levels may be pretty close though, even without normality. Even in small samples, you don't have to assume normality, you can derive tests under other assumptions. Or there are approximate nonparametric tests. – Glen_b Sep 27 '23 at 04:16
  • Hi: When you ask "ols does not need normal distribution errors", it depends on what part of OLS. Obtaining the coefficient estimates doesn't require normally distributed errors. But testing any hypothesis or calculating confidence intervals often use the assumption of normally distributed error terms. So, ols is kind of a vague term. – mlofton Sep 27 '23 at 04:39
  • nick: what you say is absolutely a good point. But he was referring to minimizing the squared errors of the fitted line versus maximum likelihood with normally distributed error terms so that's why I replied in the manner I did. – mlofton Sep 27 '23 at 05:13

2 Answers2

2

In OLS, the errors do not have to have a normal distribution or even any particular distribution at all. All OLS does is solve the correspondence:

$$ \hat\beta_{OLS}\in\\ \underset{\beta=\left( \beta_0,\beta_1,\dots,\beta_p \right)}{\arg\min}\left\{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\left( \beta_0+\beta_1x_{i1}+\dots\beta_px_{ip} \right)\right)^2 \right\} $$

(I say that it is a correspondence instead of an equation because there does not have to be a unique solution $(\arg\min)$, such as if two features add up to a third.)

However, when you assume $iid$ Gaussian errors, the maximum likelihood estimate and OLS solution are equal.

That is the link.

Dave
  • 62,186
  • 1
    +1. Viewing OLS as a gaussian glm is a popular error, but it is very freeing to understand that OLS is much more general than that with very desirable properties. – Demetri Pananos Sep 26 '23 at 18:59
  • ok! so errors dont have to be normally distributed in OLS regression? this means i dont need to do all sorts of tests to see if this is true? – stats_noob Sep 26 '23 at 19:14
  • I am still trying to understand .. imagine i do want to check if the errors are normally distributed ... I get the errors from the regression model ... and then see if these errors are normal? – stats_noob Sep 26 '23 at 19:16
  • @jwolof No need to check for normality. The other assumptions of OLS are more important than the distribution of the errors. – Demetri Pananos Sep 26 '23 at 19:19
  • @jwolof the assumptions about the normal distribution are needed (more as a sufficient condition and not so much as a neccesary condition) for inference about the estimates like computation of p-values or confidence intervals. – Sextus Empiricus Sep 26 '23 at 19:28
  • @demetri I presume by very desirable properties you are referring to the estimators being BLUE. Thats fine when linear estimators are reasonable - why not use the best of them? But sometimes all linear estimators are poor. I see many people get captivated by the B without considering the suitability of the L – Glen_b Sep 27 '23 at 04:11
1

As was mentioned by several commentators, ordinary least-squares does not require normal errors. Only when you want to say something about the sampling distribution of the OLS estimator in finite samples, normality needs to be assumed or else the estimates are not necessarily t-distributed.

To some extent, though, we're splitting hairs. The negative log-likelihood, which is being minimized to obtain $\hat{\beta}_{MLE}$, is algebraically identical to the least-squares objective function whose minimization leads to $\hat{\beta}_{OLS}$. If you further don't take the normal likelihood "too literal" but rather understand it as a second-order approximation of the "true" likelihood, you've pretty much eliminated all conceptual differences between OLS and (quasi)-MLE.

Durden
  • 1,171