Why does the MAP differ from the MLE for the uniform prior in Laplace's Rule?

Question

Laplace's Rule of Succession produces an estimate for the probability $p$ of a Bernoulli distribution. It starts with a $Beta(1,1)$ prior (equivalent to a uniform distribution prior on $(0,1)$), and then obtain the Maximum A Posterior (MAP) estimator $\frac{k+1}{n+2}$, where $k$ is the number of success in $n$ trials.

Why does this MAP estimator differ from the MLE estimator of $\frac{k}{n}$ despite having a uniform prior? Especially when it is obvious from the definition of the MAP that if the prior distribution $g(p)$ is a constant function, then MLE = MAP.

Attempts:

We could say that the $Beta(1,1)$ prior assigns $0$ to the endpoints $p = 0,1$ and hence is not truly uniform/constant. But aren't the endpoints irrelevant for the uniform distribution? The uniform distribution on $(0,1)$ and on $[0,1]$ only differs on a set of measure zero and so should be equivalent?
The derivative of the likelihood function in the derivation of the MLE contains $p$ and $1-p$ in the denominator of fractions. Which means the endpoints have to be excluded from the MLE calculations. Why then do we need to include these endpoints for MLE = MAP to hold?
In the definition of the MAP, how do I know what the domain of the prior $g(p)$ should be?

I do not think your MAP is correct, see my comment to the answer below. What your report is the posterior mean, and the mode (=MAP) of a distribution does not need to coincide with its mean under asymmetry. — Christoph Hanck, Feb 15 '23 at 17:57

Henry · Accepted Answer · 2023-02-15T18:14:22.533

10

Starting with a $\operatorname{Beta}(1,1)$ prior, your posterior would be $\operatorname{Beta}(k+1,n-k+1)$.

The mode of a $\operatorname{Beta}(k+1,n-k+1)$ distribution is $\frac{k}{n}$, which is the result you seem to want for an MAP (or MLE) estimator.

But Laplace's rule of succession instead takes the mean of the $\operatorname{Beta}(k+1,n-k+1)$ distribution, which is $\frac{k+1}{n+2}$.

Personally, I would usually take the mean of the posterior distribution, as the MAP and MLE do not correspond to a loss function and so seem difficult to justify. I might start with a different prior, such as a Jeffreys' $\operatorname{Beta}(\frac12,\frac12)$ prior.

edited Feb 15 '23 at 18:14

answered Feb 15 '23 at 17:59

Henry

39,459

2

(+1) I repeatedly commented on the discrepancy between using MAP estimators and using a loss based estimator. – Xi'an Feb 15 '23 at 19:02
2

I see where I have gone wrong. I wrongly assumed Laplace's Rule uses the MAP estimator, when it actually uses the mean of the posterior. Thanks a lot! – Legendre Feb 15 '23 at 19:38

Tim · Answer 2 · 2023-02-15T18:35:43.247

7

There is no such thing as an "uninformative" prior, that brings no information to a model.

For beta-binomial model, with uniform prior $\mathcal{B}(1, 1)$ the mode of the posterior (MAP) is $\frac{x+1-1}{x+1+x-n+1-2} = \frac{x}{x+n-x} = \frac{x}{n}$, so it's the same as MLE.

If you want the mean of the posterior to be equal to MLE, there's another prior, though an improper one, that leads to the same solution as MLE: it's the Haldane's prior $\mathcal{B}(0, 0)$.

edited Feb 15 '23 at 18:35

answered Feb 15 '23 at 17:18

Tim

138,066

2

Are you sure here? The posterior under a uniform prior is $Beta(k+1,n-k+1)$, and the mode (=MAP) of a beta distribution is, as per https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference, $(k+1-1)/(k+1+n-k+1-2)=k/n$. Haldane would be a prior under which posterior mean and MLE coincide, no? – Christoph Hanck Feb 15 '23 at 17:42
3

@ChristophHanck yep, I misread the question. Fixed. – Tim Feb 15 '23 at 18:35

Why does the MAP differ from the MLE for the uniform prior in Laplace's Rule?

2 Answers2