3

I've been reading Casella and Berger's Statistical Inference. In section 6.3 the author stated the likelihood principle: if the likelihood functions from two samples are proportional, then the inference based on the two samples should be the same. But consider the following simple example: We have a single observation from the distribution $P(X=1)=\frac{5-2p}{3}$ and $P(X=2)=P(X=3)=\frac{p-1}{3}$. The likelihood of 2 and 3 are the same. So the inference for $p=EX$ based on the observation $X=2$ and $X=3$ are the same. This contradicts my intuition since $X$ is a sufficient statistic for $p$ (and hence I think we should use $X$ to infer $p$).

So, is there really a contradiction between the likelihood principle and standard inference about parameter or I did something wrong here?

INvisibLE
  • 33
  • 4
  • Are you saying your estimate of $p$ would be $x$, because $p=\operatorname{E}X$? – Scortchi - Reinstate Monica Mar 21 '24 at 11:22
  • I do think so because $X$ is a unbiased sufficient statistic for $p$. For only one single observation, I can not come up with any other estimator. – INvisibLE Mar 21 '24 at 12:13
  • X isn't minimal sufficient, though. A minimal sufficient statistic $T$ partitions the sample space into $X=1$ & $X\in {2,3}$. And if you want an unbiased estimator with lower variance than $X$ it's $\operatorname{E}{X|T}$ (which also won't make impossible estimates of $p$). – Scortchi - Reinstate Monica Mar 21 '24 at 12:29
  • Yeah I agree what you've said. If we use the UMVUE, then we observe $X=2$ or $X=3$, we shall both use $\hat{X}=2.5$ to estimate $p$, which is still challenging my intuition though. – INvisibLE Mar 21 '24 at 12:36
  • Well, $\Pr(X=2|X\in{2, 3})= \Pr(X=3|X\in{2, 3}) = \tfrac{1}{2}$regardless of the true value of $p$. It's a coin toss, & including its outcome in your estimator only increases the estimator's variance. – Scortchi - Reinstate Monica Mar 21 '24 at 14:50
  • Using a single observation to try to make inferences within statistical models is often a fool's game as you usually need more data than estimated parameters to have any chance of drawing a useful inference about the model parameters. – Michael Lew Mar 22 '24 at 03:26
  • Also other posts deal with this. E.g. recently https://stats.stackexchange.com/questions/574597/does-the-likelihood-ratio-test-violate-the-likelihood-principle you may want to specify more precisely what your question has different. – Sextus Empiricus Mar 22 '24 at 10:29
  • Actually, I misread your question. It seems more like being based on a confusion about the likelihood function and what it is. – Sextus Empiricus Mar 22 '24 at 10:54

3 Answers3

4

$X$ is an unbiased estimator of $p$, & is indeed sufficient, but not minimal sufficient: $$\Pr(X=2|X\in\{2, 3\})= \Pr(X=3|X\in\{2, 3\}) = \tfrac{1}{2} $$ regardless of the true value of $p$; & so a minimal sufficient statistic $T$ partitions the sample space into $X=1$ & $X\in \{2,3\}$. Then if you want an unbiased estimator with lower variance than $X$, Rao–Blackwellize: $$\operatorname{E}{X|T}= \begin{cases} 1 & \text{ when }X=1\\ 2\tfrac{1}{2} & \text{ when } X\in\{2,3\}\end{cases}$$ This is about as 'standard' as inference gets. (Though perhaps more typical is the case where an unbiased estimator is not sufficient, & can be improved by Rao–Blackwellizing it—say, with a sample size greater than 1, $\bar X$ as an estimator of $p$.)

Note that $T$ may be coded as '0' & '1', for $X=1$ & $X\in\{2,3\}$ respectively; & then writing $\pi=\Pr(T=1)=\tfrac{2(p-1)}{3}$ makes it apparent that we've been concerned with inference about the probability parameter of a single Bernoulli trial. If anything still seems unintuitive, it's perhaps because you'd typically formulate such a model straight off, without drawing unnecessary distinctions between evidentially equivalent outcomes. (Suppose parts coming off an assembly line are tested, & an unknown proportion rejected; then half of those that aren't rejected are painted red and the other half painted blue.)

3

There are many differently worded versions of the likelihood principle, but in essence the likelihood principle says that data that yield the same (proportional) likelihood function have the same evidential meaning concerning values of the parameter(s) of interest, according to the statistical model(s). Crucially, it does not say anything at all about inferences that might be informed by such evidence.

Some statements of the likelihood principle talk of 'inference', but that is a mistake. Did Casella and Berger make such a mistake? (I really don't know, as I no longer have access to their book...) If so then I will add a couple of sources that agree with me and not them.

"Within the framework of a statistical model, all of the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses." (Edwards 1972, 1992 p. 30)

The likelihood principle (L): If $E$ and $E′$ are any two experiments with the same parameter space, represented respectively by density functions $f(x, θ)$ and $g(y, θ)$; and if $x$ and $y$ are any respective outcomes determining the same likelihood function; then $Ev(E, x) = Ev(E′, y)$. That is, the evidential meaning of any outcome $x$ of any experiment $E$ is fully characterized by giving the likelihood function $cf(x, θ)$ (which need be described only up to an arbitrary positive constant factor), without reference to the structure of $E$. (Birnbaum 1962)

Neither of those says anything about inference. See this answer on this site for a description of how equal evidence can lead to different inferences without any violation of the likelihood principle.

Given that the likelihood principle does not say anything about inference, your inferences about the results in question need to be informed by more than just the likelihood principle.

Birnbaum, A. (1962), ‘On the foundations of statistical inference’, Journal of the American Statistical Association 57(298), 269–306.

Edwards, A.W.F. (1992), Likelihood: expanded edition, Johns Hopkins University Press, Baltimore.

Michael Lew
  • 15,102
  • C&B precisely discussed the framework of evidence function and the likelihood principle (my post is based on that). I think they agree with you too. But I would have a look again. – User1865345 Mar 22 '24 at 03:30
  • C&B says that if the likelihood function are proportional, the conclusion drawn from them are identical, so I guess maybe I misunderstood the principle. Anyway, thanks for replying and providing such a detailed and comprehensive explanation. I will take a closer look later. – INvisibLE Mar 22 '24 at 03:44
  • @INvisibLE I think that we need to be clear about a distinction between identical "conclusions drawn from them" where those conclusions concern the evidential support of various parameter values and conclusions that are broader inferences. The difficulty of being explicit without being prolix may be behind a lot of misunderstandings in this area. – Michael Lew Mar 22 '24 at 03:48
  • I agree. I do find myself confused when reading something about likelihood principle. I might read your reference later. – INvisibLE Mar 22 '24 at 03:55
  • C&B goes on discussing the binomial-negative binomial inference problem using the Formal Likelihood Principle to show "equivalent inferences from different experiments", fyi. – User1865345 Mar 22 '24 at 04:06
1

If $P(X=2)=P(X=3)=\frac{p-1}{3}$, how is $X=2$ versus $X= 3$ giving any different information about $p$?

Whether $X=2$ or $X=3$ happens, is independent from $p$.

Imagine the following situation for generating $X$:

  • First flip a random coin with probability $(5-2p)/3$ for heads and $(2-2p)/3$ for tails.

    • If the result is heads then assign $X:=1$
    • If the result is tails then flip another independent fair coin and assign $X:=2$ in case of head and $X:=3$ in case of tails.

Why should that potential second coin flip, which is independent from $p$, make any influence about the inference about $p$?

This contradicts my intuition since $X$ is a sufficient statistic for $p$ (and hence I think we should use $X$ to infer $p$).

The variable $X$ is not the sufficient statistic.

Instead, the sufficient statistic is the odds of $X=1$ versus $X=2$ or $X=3$.

E.g. a sufficient statistic is the count for the number of times that $X=1$.

Infact, in a way the likelihood function is a sufficient statistic. For your example, the likelihood functions for $X=2$ and $X=3$ being equal means that both observations are coupled as independent from $p$ and appear as a single value in the sufficient statistic.


In your situation you wonder about the likelihood principle when two different observations give the same likelihood function. Other situations occur where different models give the same likelihood function. Some related questions about that different angle are:

An example where the likelihood principle *really* matters?

Does the likelihood ratio test violate the likelihood principle?