0

Main question

Probability theory can be a weird place sometimes. Here I was, confident in my insane math skills, trying to solve the following problem:
Let $N, \alpha$ and $\beta$ be given.

  • Draw $N$ coins from a fixed bag with different coins, such that the success probability for each coin is unknown but modeled by a Beta distribution $Beta(\alpha, \beta)$.
  • Flip each coin from the bag once and count the number of successes.

How are the number of successes distributed?


What I have tried so far

Based on some googling I thought it should be the Beta-Binomial distribution!". But this quick Python implementation:

sample_size = 2**20

N = 42 alpha = 3.14159 beta = 2.71828

coins = np.random.beta(alpha, beta, size=N)

sample = [] for i in range(sample_size): R = np.random.uniform(size=N) wins = np.sum(np.where(coins>R, 1, 0)) sample.append(wins)

leads to the histogram:

enter image description here

Instead, it seemed like the simulated distribution just resembles a "normal" binomial distribution:

enter image description here

where $\hat{\mu}$ is the sample mean. However, a simple $\chi^2$-test rejects this hypothesis:

>>> cats, f_obs = zip(*[(cat, len(list(group))) for cat,group in groupby(sorted(sample))])
>>> f_exp = scipy.stats.binom.pmf(k=cats, n=N, p=np.mean(sample)/N)*sample_size
>>> 
>>> scipy.stats.chisquare(f_obs=f_obs, f_exp=f_exp)
Power_divergenceResult(statistic=14303.936045316765, pvalue=0.0)

I stumbled upon this answer but @probabilityislogic seems to be suggesting to do exactly what I have done.

My question now is: What kind of distributions are generated by this experiment and why is it not the Beta-Binomail distribution?

1 Answers1

1

Your error is that you use precisely the same $42$ probabilities (coins) in each of the $2^{20}$ simulations, and so lose the element of sample variance which would come from these varying

You would get the Beta-Binomial distribution if you chose new probabilities each time, for example putting

coins = np.random.beta(alpha, beta, size=N)

inside the for loop

Henry
  • 39,459
  • Thanks for this insight! I guess my question then is not well posed. In my actual problem where I need this the bag of coins indeed is fixed, and we only know that the success probabilities of the (unknown but fixed) coins are modeled by a Beta-distribution. – chickenNinja123 Feb 22 '19 at 10:45
  • I modified the question box now. – chickenNinja123 Feb 22 '19 at 10:48