7

Laplace's Rule of Succession produces an estimate for the probability $p$ of a Bernoulli distribution. It starts with a $Beta(1,1)$ prior (equivalent to a uniform distribution prior on $(0,1)$), and then obtain the Maximum A Posterior (MAP) estimator $\frac{k+1}{n+2}$, where $k$ is the number of success in $n$ trials.

Why does this MAP estimator differ from the MLE estimator of $\frac{k}{n}$ despite having a uniform prior? Especially when it is obvious from the definition of the MAP that if the prior distribution $g(p)$ is a constant function, then MLE = MAP.

Attempts:

  1. We could say that the $Beta(1,1)$ prior assigns $0$ to the endpoints $p = 0,1$ and hence is not truly uniform/constant. But aren't the endpoints irrelevant for the uniform distribution? The uniform distribution on $(0,1)$ and on $[0,1]$ only differs on a set of measure zero and so should be equivalent?

  2. The derivative of the likelihood function in the derivation of the MLE contains $p$ and $1-p$ in the denominator of fractions. Which means the endpoints have to be excluded from the MLE calculations. Why then do we need to include these endpoints for MLE = MAP to hold?

  3. In the definition of the MAP, how do I know what the domain of the prior $g(p)$ should be?

Richard Hardy
  • 67,272
Legendre
  • 217
  • 1
    I do not think your MAP is correct, see my comment to the answer below. What your report is the posterior mean, and the mode (=MAP) of a distribution does not need to coincide with its mean under asymmetry. – Christoph Hanck Feb 15 '23 at 17:57

2 Answers2

10

Starting with a $\operatorname{Beta}(1,1)$ prior, your posterior would be $\operatorname{Beta}(k+1,n-k+1)$.

The mode of a $\operatorname{Beta}(k+1,n-k+1)$ distribution is $\frac{k}{n}$, which is the result you seem to want for an MAP (or MLE) estimator.

But Laplace's rule of succession instead takes the mean of the $\operatorname{Beta}(k+1,n-k+1)$ distribution, which is $\frac{k+1}{n+2}$.

Personally, I would usually take the mean of the posterior distribution, as the MAP and MLE do not correspond to a loss function and so seem difficult to justify. I might start with a different prior, such as a Jeffreys' $\operatorname{Beta}(\frac12,\frac12)$ prior.

Henry
  • 39,459
7

There is no such thing as an "uninformative" prior, that brings no information to a model.

For beta-binomial model, with uniform prior $\mathcal{B}(1, 1)$ the mode of the posterior (MAP) is $\frac{x+1-1}{x+1+x-n+1-2} = \frac{x}{x+n-x} = \frac{x}{n}$, so it's the same as MLE.

If you want the mean of the posterior to be equal to MLE, there's another prior, though an improper one, that leads to the same solution as MLE: it's the Haldane's prior $\mathcal{B}(0, 0)$.

Tim
  • 138,066
  • 2
    Are you sure here? The posterior under a uniform prior is $Beta(k+1,n-k+1)$, and the mode (=MAP) of a beta distribution is, as per https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference, $(k+1-1)/(k+1+n-k+1-2)=k/n$. Haldane would be a prior under which posterior mean and MLE coincide, no? – Christoph Hanck Feb 15 '23 at 17:42
  • 3
    @ChristophHanck yep, I misread the question. Fixed. – Tim Feb 15 '23 at 18:35