2

I have two binomial random variables $X_1 = \text{Bin}(N, p_1)$ and $X_2 = \text{Bin}(N, p_2)$; same number of trials but different probabilities of success. Suppose that the number of trials $N$ is unknown, but I have estimates of $p_1$ and $p_2$ are known to me. I want to use a statistical test at a significance level of $\alpha$ to ensure that the observed $X_1$ and $X_2$ occur in the right proportions according to the estimated $p_1$ and $p_2$. That is, we expect $X_1 / X_2$ to be distributed around $p_1/p_2$ somehow.

Any leads on what statistic/distribution to use here?

lmyt
  • 21
  • 1
    A statistical test tests parameter values, not data values. The only unknown parameter here is $N$. You could test whether $N_1 = N_2 = N$, which would, I think, be what you really want. – jbowman Feb 28 '24 at 16:26
  • I would find the maximum-likelihood $N$ given the data. Using that $N$ you can say how improbable it is to have $X_1-X_2$ as far from $Np_1-Np_2$ as you observe. – Matt F. Feb 28 '24 at 16:46
  • 1
    Question is not entirely clear, some posts that may be of help: https://stats.stackexchange.com/questions/113851/bayesian-estimation-of-n-of-a-binomial-distribution, https://stats.stackexchange.com/questions/123367/estimating-parameters-of-a-binomial-model/123748#123748, – kjetil b halvorsen Feb 28 '24 at 16:54
  • @jbowman, question has been re-phrased. I am searching for a statistical test to evaluate whether $X_1 / X_2$ is consistent with the estimated probabilities. – lmyt Feb 28 '24 at 16:59
  • 1
    That still doesn't work as a statistical test because you are trying to test data, not parameters. Perhaps you could test for whether the ratio $p_1/p_2$ that generated $X_1$ and $X_2$ is the same as the ratio $p_1/p_2$ that generated the data used to estimate the... estimates... of $p_1$ and $p_2$ that you have. For that, you'd need to know the properties of the estimates you have, though. Also note that multiple values of $p_1$ and $p_2$ can generate the same ratio, so it's not a test of equality of the probabilities themselves. – jbowman Feb 28 '24 at 17:44
  • 2
    As stated, this does not seem to have an answer. I understand that you have estimates of p1 and p2 (as opposed to simply have an estimate of p1/p2), you have 2 outcomes X1 and X2 (counts), but you do not know the N these 2 outcomes came from. You want to know if X1/X2 is statistically compatible with p1/p2. A simple example will show that this can not be done w/o knowing N. Say p1=25% and p2=75%, so p1/p2=1/3. You just drew out of N (unknow!) tries 10 and 30 outcomes. X1/X2=1/3. tbc... – jginestet Feb 28 '24 at 22:05
  • ...cont

    Now, let's pretend that N=40: then indeed X1/X2 is totally compatible with p1/p2. This is indeed what we would expect. But, let's now pretend that N=400. Then X1/X2 is absolutely not compatible with p1=25% and p2=75%... Now, if all you know is the ratio p1/p2 (but that was not how you stated it), maybe we could check compatibility w/o knowing N. But if you know p1 and p2, you would need to also know N.

    – jginestet Feb 28 '24 at 22:06
  • 1
    What hypothesis are you testing? To adopt a clearer notation, suppose your assumed proportions are $\pi_1$ and $\pi_2$ with ratio $\pi_1/\pi_2=\alpha.$ Are you perhaps using the independent observations of $X_1$ and $X_2$ to test the hypothesis $p_1/p_2=\alpha$ against the alternative $p_1/p_2\ne\alpha$? Or maybe you're testing whether simultaneously $p_1=\pi_1$ and $p_2=\pi_2$ against the alternative that one or both of the $p_i$ differ from the $\pi_1$? Or something else? – whuber Feb 28 '24 at 22:39
  • @jginestet - that's basically an answer to any variant of the question I can imagine; why not post it as such? I'd upvote it. – jbowman Feb 29 '24 at 00:21
  • 1
    Thanks for all your observations; clearly I'm not thinking lucidly about this problem. Let me back up, then. I have observations $X_1$ and $X_2$ and my null hypothesis is that they come from binomial distributions with $p_1 = \pi_1$ and $p_2 = \pi_2$, to adopt @whuber's notation. If I knew $N$, I could use an exact binomial test for each and be done. Since I don't know $N$, it occurred to me that the ratio of the two variables would be significant since on average it is independent of $N$. Hence my focus on the ratio in the above question, though as mentioned here, it is not a good approach. – lmyt Feb 29 '24 at 12:30
  • @jginestet, it is silly to pretend we know nothing about $n$, since on any approach $n=40$ is much more plausible than $n=400$. – Matt F. Feb 29 '24 at 13:53
  • I would answer this question if it provided a brief description of a trial and sample numbers for $p_1, p_2, X_1, X_2, \alpha$. – Matt F. Mar 01 '24 at 13:40

1 Answers1

1

Turning my earlier comment into an asnwer, as requested by @jbowman

  1. Assuming first that you are trying to test whether your observed counts X1 and X2 are compatible with the estimated proportions p1 and p2. I.e. your null is X1=p1 and X2=p2. Hence the alternate is that at least one of the X's is not equal to the hypothesized proportion estimate.

As per my earlier comment, you can not do that w/o knowing the N (equal for both samples 1 and 2) your counts came from.

A counterexample would be e.g.: p1 and p2 are estimated respectively at 75% and 25%. Your counts are X1=30 and X2=10.If you drew these counts from samples of size 40, then X1 and X2 are compatible with 75% and 25%. But if you drew them from samples of size, e.g. 400 (or 100, or ...), then the observed counts are NOT compatible with 75% and 25%. So you need to know N.

A reason you need N is because you are trying here to compare apples to ornages. p1 and p2 are proportions, percentages: unitless numbers. But X1 and X2 are counts (e.g. number of "successes"); you can not compare a count to a percentage. YOu need N to turn X1 and X2 into percentages. Then you can compare them.

  1. Now let's assume you are instead trying to see if the ratio X1/X2 is compatible with the ratio p1/p2 (now they are both ratios, or proportions, unitless, therefore we may be able to compare).

To use exact binomial tests, you would need N. To use normal approximations to the binomial, you would need the dof (hence N), and the variances, which also require N $(var=n*p*(1-p))$ You could use a bootstrap, by again you would need N (you kow how many 1's to put in the sample, but not how many 0's...)

So it would seem there is not a way?

jginestet
  • 440
  • 1
  • 9