1

I have several questions regarding this proof:

enter image description here

  1. Shouldn't the $\propto$ be used instead of the $=$ when we leave out the $\frac{1}{\sqrt{2\pi\sigma²}}$ ?
  2. Is a maximization problem simply inverted to a minimization problem by $\cdot (-1)$?
  3. Is the second last line correct? Why is $(-(y_i - (\beta_0+x_{i1} \beta_1 + ...))²$ equal to $(y_i - (\beta_0+x_{i1} \beta_1 + ...)²)$? It seems the closing paranthese is misplaced.
  4. In the end they are basically the same because we take the derivative and set it to zero: $\frac{d}{d\beta}\sum(y_i - x_i \beta) \stackrel{!}{=} 0$ and it doesn't matter if we maximize or minimize, or not? Also $-\frac{d}{d\beta}\sum(y_i - x_i \beta) \stackrel{!}{=} 0$ would yield the same since we can simply multiply by $\cdot (-1)$

Thanks in advance! :)

  • 2
    Looking at this abstractly will make short work of it: the ML problem is to maximize a function $f(\beta)$ while the OLS problem is to minimize a function $g(\beta).$ Since $f(\beta)=h(g(\beta),\sigma)$ where for any $\sigma\gt 0,$ the function $\beta\to h(\beta,\sigma)$ is monotonically decreasing, the ML and OLS solutions coincide. The extensive (and unnecessary) algebraic manipulation in the quoted solution only obscures this simple, intuitive idea. – whuber Jan 11 '24 at 18:50
  • 2
    Here's a related question from the archives with some very good answers. – Durden Jan 11 '24 at 18:53
  • Also thank you guys :) – BlankerHans Jan 11 '24 at 19:05

1 Answers1

2
  1. No, because the maximization happens over $\beta$, not $\sigma$, thus changing $\sigma$ would not influence the optimal choice of $\beta$.

  2. Yes. To give another simple example of this, think of parabolas: finding the minimum of $f(x) = x^2$ is equivalent to finding the maximum of $f(x) = - x^2$.

  3. No, this is what you mean in point 2) actually: The parentheses are indeed wrong in the last 2 steps.

It should be:

arg max$_{\beta} \frac{1}{2\sigma^2} \sum_{i=1}^{n} (-(y_i - \mu_i)^2)$

= arg max$_{\beta} \frac{1}{2\sigma^2} \sum_{i=1}^{n} (-[y_i - (\beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip})]^2)$

And now you use that minus sign in front of the $y_i$ to flip from arg max to arg min:

= arg min$_{\beta} \frac{1}{2\sigma^2} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip}))^2$

  1. Choosing to maximize or minimize depends on the estimator you use, or you minimize the least squares, or you maximize the likelihood, but your proof just says that both will indeed give the same solution.
  • Thanks! :) Regarding 1. that is because we set the derivative to zero therefore we can divide by $\frac{1}{\sqrt{2\pi\sigma²}}$ then why we don't leave out the $\frac{1}{2\sigma²}$ ? – BlankerHans Jan 11 '24 at 19:05
  • 1
    We actually do from the second to last, to last step, the $\sigma^2$ term disappears. You could indeed have done this earlier ;) – Mathemagician777 Jan 11 '24 at 19:32