1

I saw a post on why MLE and MAP yield the same result when under uniform prior. But, I was wondering about the case when they are under Gaussian Prior. I suppose they are different in this case but I do not know the right way to explain this.

Can someone tell me if my assumption is right? and provide more explanation?

  • 3
    MLE finds the point which maximises the likelihood while MAP finds the point which maximises the prior times the likelihood. If the prior is constant these are the same point; if the prior is not constant then they need not be the same. – Henry Oct 11 '22 at 08:29
  • @Henry so basically, it depends on whether the prior is constant or not right? Does not matter if it is gaussian or uniform? – jimmy1998 Oct 11 '22 at 08:49
  • for linear regression, the MAP is basically ridge regression ( ie adding lambda x 2norm of coefficients) to MLE cost function = sum squared error – seanv507 Oct 11 '22 at 08:54
  • A uniform prior is constant on its support. Any non-uniform prior is not constant on its support – Henry Oct 11 '22 at 09:22

1 Answers1

2

It's quite easy to see and doesn't need sophisticated math. In the maximum likelihood (MLE) you maximize

$$ \underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta) $$

while with maximum a posterior (MAP) you also consider the prior for $\theta$

$$ \underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta) \, p(\theta) $$

Now the statement about uniform prior is not exactly true. It is true if you consider a flat, improper (that doesn't integrate to 1), prior $p(\theta) \propto 1$ (it is equal to 1 for any value of $\theta$), then it yields

$$ \underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta) \times 1 $$

and is the same as maximum likelihood. But imagine that the prior is uniform over a bounded region, say

$$ p(\theta) = \begin{cases} 1 & \theta \in (100, 101), \\[6pt] 0 & \text {otherwise}. \end{cases} $$

then MLE and MAP would be equal only if MLE returned a result between 100 and 101 because in other cases, the posterior in MAP would be equal to zero.

As for Gaussian prior, or any other, if you multiply likelihood by anything else than a flat prior, you are maximizing a different function. If you have a lot of data, prior becomes less relevant, so it can happen that MAP and MLE will lead to similar results, but this does not need to happen.

Tim
  • 138,066