6

I've been going over Berger's famous example of negative binomial vs binomial sampling leading to two different p-values conditional on the same observed data. To summarize, suppose we observe 9 tails and 3 heads. We could have arrived at this data set by 1) flipping a coin until we see 3 heads or 2) flipping 12 coins. The first corresponds to negative binomial sampling and the second to binomial sampling. Berger goes on to argue that the p-value derived from the following test

$$H_0: p=.5$$ $$H_a: p \neq .5$$

will give different answers based on the sampling scheme.

At first I thought the LRT would get around this problem, but if you compute the LRT under binomial, you get:

$$\lambda_{lr} = -2 \ln\left[\frac{\sup_{\theta \in {.5}} L(\theta)}{\sup_{\theta \in {\Omega}} L(\theta)}\right]$$

$$\lambda_{lr} =-2\ln\left[ \frac{\binom{ 12 }{3} .5^9(1-.5)^3}{\binom{12 }{3} \hat{p}^3(1-\hat{p})^9}\right ]$$ where $$\hat{p} = \bar{x}$$

As we can see the binomial coefficient drops out. This also happens for a negative binomial, however, $\hat{p}$ is no longer $\bar{x}$, meaning you get a different answer and therefore different test statistic and different asymptotic p-value.

However, in the case a of specific point alternative, it seems to me that the binomial coefficients do drop out and give the same asymptotic p-value for both negative binomial and binomial LRTs.

Big picture question: do composite LRTs violate the likelihood principle but simple LRTs do not?

User1865345
  • 8,202
Casey
  • 151
  • 4
    Significance tests and p-values in general violate the likelihood principle, even if they are based on the likelihood ratio test statistic (and even for simple hypothesis testing) – J. Delaney May 09 '22 at 19:53
  • I agree in general, but in the specific case of a simple (point) alternative hypothesis things seem to work out in the LRT? Meaning you get the same asymptotic p-value? – Casey May 09 '22 at 19:59
  • 2
    No. You can have two experiments that observe the same LR but have completely different p-values. – J. Delaney May 09 '22 at 20:08
  • If two experiments have the same LR then they have the same asymptotic p-value based on Wilks' theorem no? Since Wilks says log(LR) -> ChiSq the test statistic of the asymptotic p value only depends on the LR? – Casey May 09 '22 at 21:07
  • 3
    Wilks's theorem holds for nested hypotheses, not for comparing two simple (point) hypotheses. Yes you can find cases where p-values don't violate the likelihood principle, but those are the exception rather than the rule – J. Delaney May 10 '22 at 14:46
  • @J.Delaney That's a good point about Wilks. I see, so in general there is no rule about simple hypotheses obeying this principle. Are there any hypothesis test that do obey the LP in general? – Casey May 10 '22 at 14:50
  • @J.Delaney What does a violation of the likelihood principle look like? How do p-values "in general" do that? Seems to me that it is the Neyman–Pearsonian decision that appears to violate the LP, but the LP says nothing about decisions. See here: https://stats.stackexchange.com/questions/378454/an-example-where-the-likelihood-principle-really-matters/641554#641554 – Michael Lew Mar 12 '24 at 20:39

2 Answers2

1

Because LRTs (likelihood ratio tests) are based on the fiducial distribution they can violate the likelihood principle. The ratio's use likelihood, but ultimately the question is turned into a p-value or confidence interval by asking the question whether the observed statistic is above some significant critical value or not. The likelihood or LR (likelihood ratio) is only used to find a statistic that is most powerful, not to abide with the likelihood principle.

There is a difference between likelihood and confidence. See also What does "fiducial" mean (in the context of statistics)?

If $F(\hat\theta; \theta)$ is the cumulative distribution function of some parameter estimate $\hat\theta$ given the true parameter $\theta$ then the fiducial/confidence distribution is $-\frac{d}{d\theta}F(\hat\theta,\theta)$ whereas the likelihood function is $\frac{d}{d\hat\theta}F(\hat\theta,\theta)$.

Because they consider changes of probability in different directions they can be different. We do not always have $-\frac{d}{d\theta}F(\hat\theta,\theta) = \frac{d}{d\hat\theta}F(\hat\theta,\theta)$.

  • I find that I have a renewed interest in this area. What does a violation of the likelihood principle look like? Is there really a reason that statistical 'confidence' which is based on global error rates and likelihood which documents local evidence should agree, and that any apparent disagreement is a violation of the LP? (I may have to formulate a proper question...) – Michael Lew Mar 12 '24 at 20:43
  • @MichaelLew My understanding of the Likelihood Principle is that it clashes with almost all frequentist statistics, and Birnbaum seems to have a similar view. Perhaps I am wrong. Birnbaum rejected his versions of the Likelihood Principle and the related Conditionality Principle in favour of the confidence concept of statistical evidence not that long after his 1962 paper was published. ... – Graham Bornholt Mar 13 '24 at 01:53
  • (cont) For example, “If there has been ‘one rock in a shifting scene’ …. it has not been the likelihood concept …. but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists.” (Birnbaum 1970, p. 1033) – Graham Bornholt Mar 13 '24 at 01:54
  • (cont2) After explaining what he means by the confidence concept having taken some aspects from the Neyman Pearson approach, he states “The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy” (Birnbaum 1970 p.1033). – Graham Bornholt Mar 13 '24 at 01:56
  • @GrahamBornholt Thanks for those responses. Yes, I am aware that Birnbaum shifted to his 'confidence concept', but he never really defined it, as far as I could find. That is a difficult 'rock' to stand on. As far as I could tell, Birnbaum's only stated reason to disfavour the likelihood version of statistical evidence stemms from confusion regarding the performance of likelihoods in problem with two parameters of interest and a single observation, That's a weak reason, IMO, but it was also apparently persuasive to Ian Hacking as well. See my analysis: https://arxiv.org/pdf/1507.08394.pdf – Michael Lew Mar 13 '24 at 02:46
  • (cont) My current take on the clash between confidence (Birnbaum's or a conventional frequentist account) and likelihood evidence is like a clash between metres and pounds: they measure different things that might be useful to know, but they are largely distinct. (Either version of pounds work!) – Michael Lew Mar 13 '24 at 02:49
  • It seems that Birnbaum's confidence concept was acknowledging that actual usage in practice of frequentist methods is inferential (and should be) rather than about making 'decisions' in the N-P sense (whether testing or using confidence intervals). Busy at the moment but will have a read. – Graham Bornholt Mar 13 '24 at 03:46
  • The differences between methods, albeit a same likelihood function, can occur because likelihood and posterior are considering a different slice of the distribution $F(\hat{\theta},\theta)$ than p-values and confidence intervals. The latter condition on the parameter and consider the entire distribution. The former only cares about the derivative $\frac{\partial}{\partial \hat{\theta}} F(\hat{\theta},\theta)$ and is independent from the cumulative value $F(\hat{\theta},\theta)$ – Sextus Empiricus Mar 13 '24 at 06:16
  • But you probably already know this, so I might not understand the question from your comment. Possibly you are after the idea of the difference between different evidence and different inference, and the latter not being a real violation of the principle? – Sextus Empiricus Mar 13 '24 at 06:30
  • @SextusEmpiricus Yes, exactly. Different inferences in the face of the same evidence probably do not constitute a 'violation' because there can be differences in intention, prior information and loss functions, as I pointed out here: https://stats.stackexchange.com/questions/378454/an-example-where-the-likelihood-principle-really-matters/641554#641554 . I am now not sure what a violation of the likelihood principle would look like in practice. – Michael Lew Mar 13 '24 at 20:18
  • 1
    @MichaelLew I believe that by violation people mean different inference. Like in the example by the OP with negative binomial vs binomial. I agree that this type of 'violation' is not bad. If violation refers to 'information' instead of 'inference' then I don't know either how violation is possible. If two methods result in the same likelihood function, then they have the same information about a population parameter. That's by definition and can not be violated. There's no choice involved in methods to 'change' the information in the data (or maybe data wrangling?). – Sextus Empiricus Mar 13 '24 at 23:05
  • @SextusEmpiricus Yes, I agree with what you say. I think that it follows from the fact that the LP is silent about inference that such a 'violation' is not a violation in the normal meaning of the word. We could probably think more clearly about many statistical conflicts if we were more clear about what is evidence and what is inference. Some of those conflicts, like this 'violation' of the LP thing, might even go away... – Michael Lew Mar 14 '24 at 00:19
0

Following on from discussion in the comments above with @SextusEmpiricus and @GrahamBornholt, and from my previous thoughts here An example where the likelihood principle *really* matters?, I will say that 'conflict' between the likelihood principle and inferences that might follow from a likelihood ratio test (or any other test, frequentist or Bayesian, or even pure likelihood) is not a conflict at all.

The likelihood principle says that data that yield the same (proportional) likelihood function have the same evidential meaning concerning values of the parameter(s) of interest, according to the statistical model(s). Crucially, it does not say anything at all about inferences that might be informed by such evidence.

Inferences will be (should be!) informed by the data (evidence) and and also at least some statistical issue such as loss functions concerning the possible inferential errors and prior information and or prior probability distributions about parameters of interest. There are also non-statistical issues that are relevant to scientific inferences, such as whether the study in question is intended to be definitive or preliminary, how expensive it might be to follow up on important results, whether there are corroborating results available, how reliable is the report or reporter, and maybe how well the statistical methods match the inferential intents. I am sure that you can think of other things as well, but the upshot is that there can never be a one to one relationship between statistical evidence and inference. And therefore most of (all?) the claimed 'violations' of the likelihood principle are nothing of the kind.

Michael Lew
  • 15,102