1

Suppose that I have the following data: $4,0,4,2,1,8,4,1,3,3,5,4,6,1,3$. How do we check if the data comes from a binomial distribution, say Bin($p,8$) for some $p$ or from some other distribution on $\{0,1,\dots, 8\}$? Any help is greatly appreciated.

PS: Though I know a bit of statistics, I do not have a stat background (but math background!).

Ashok
  • 1,181
  • Where did your value of $n=8$ come from? Was it estimated as the largest of the data values or did you know it beforehand? – whuber Mar 23 '23 at 02:26
  • 1
    @whuber Yes, it was just estimated as the largest of the data values. – Ashok Mar 23 '23 at 03:35
  • Do you know if the success probability is the same for all fifteen of the binomial trials? That is, could one be $\text{Binomial}(8, 0.2)$ while another is $\text{Binomial}(8,0.9)?$ – Dave Mar 23 '23 at 03:39
  • Perhaps start with theory: how are you expecting the data to be generated? For example, if you have a fixed number of trials of which can be success or failure, binomial data may be appropriate. If it's the count of trials until a success, a geometric. If it's a counting process of some kind, Poisson or Negative Binomial.

    Failing that, you may be able to use the R function fitdist to fit a variety of distributions and assess which looks best.

    – Alex J Mar 23 '23 at 04:10
  • @Dave: Yes, we know that the given sample is i.i.d. Sorry, I should have mentioned this in the problem. – Ashok Mar 23 '23 at 05:35
  • @AlexJ: I understand that what you say could be a practical solution in such a situation. However, what I'm looking for is, whether there's is any theory that deals the hypothesis testing problem in my question. – Ashok Mar 23 '23 at 10:42
  • 3
    When you have to estimate $n$ and $n$ is larger than $3$ or so, typically you need a huge sample size, because the Binomial distributions with parameters $(n,p)$ and $(n^\prime,p^\prime)$ where $np = n^\prime p^\prime$ are almost indistinguishable -- and become ever closer as $n$ and $n^\prime$ increase. This is one reason we need the specifics of your actual problem in order to supply workable answers. – whuber Mar 23 '23 at 13:47
  • @whuber: In our problem $n$ is known. We know that the distribution has support ${0,1,\dots, 8}$. All we want to test is, whether it is Bin($8,p$) for some $p$ or some other distribution on ${0,1,\dots, 8}$. – Ashok Mar 24 '23 at 01:50
  • 1
    Ashok You have completely confused me, because when I asked you whether it was estimated or known, your answer was "it was just estimated as the largest of the data values." Has that changed?? If $n$ is known, just apply the standard textbook chi-squared test. – whuber Mar 24 '23 at 13:01
  • @whuber: When I say "n is known", I mean 'n is assumed known' (actually I took the largest of the data values). Btw, this n does not necessarily mean the binomial parameter n. All we know is that the underlying distribution, say $P$ is something that has support ${0,1,\dots, 8}$. Now the problem is to test if $P$ is Bin($8,p$) for some $p$ or some other distribution on ${0,1,\dots, 8}$. Hope it is clear now. Does the usual 'chi-square test' address this problem? – Ashok Mar 25 '23 at 05:18
  • 1
    "Taking the largest of the data values" is an estimate of $n$ (and might be a poor one). You don't know the support is limited at $8$ if this is an estimate. The usual chi-squared test will not produce correct p-values in this case unless the estimate is accurate. In your example the likelihood is maximized only in the limit as $n$ grows arbitrarily large and your estimate $\hat n = 8$ is not even within a 95% confidence interval. – whuber Mar 25 '23 at 13:14
  • Have a look at https://stats.stackexchange.com/questions/123367/estimating-parameters-for-a-binomial/123748#123748 and https://stats.stackexchange.com/questions/219200/mle-for-the-binomial-distributed-data-number-of-boys-in-families/453210#453210 – kjetil b halvorsen Mar 29 '23 at 03:49

0 Answers0