0

When a variable is binary, it sure seems like its distribution is totally characterized by the probability of being in one group: the variable takes one value with probability $p$ and the other value with probability $1-p$. This is essentially a Bernoulli distribution.

But...

There is a beta-binomial distribution, which says that there are $n$ trials (flips of a coin) of a Bernoulli random variable, where each Bernoulli has a a probability $p$ that is drawn from a beta distribution. Thus, the parameters of a beta-binomial distribution are not $n$ and $p$ like the binomial but $n$ (still) as well as the $a$ and $b$ of a beta distribution.

Beta-binomial distributions on $n$ trials can give distributions that simply cannot be achieved by binomial distributions, so the beta-binomial is different from the usual binomial. When we restrict the beta-binomial to be on only one trial, do we get anything useful that is not captured by the usual Bernoulli probability parameter?

That's just one idea. There are all kinds of other distributions one can put on $[0,1]$ that are not beta distributions but are valid distributions from which Bernoulli probability parameters can be drawn. When we have multiple trials of a Bernoulli, I definitely get why more than just binomial is in play. Is this useful for just one Bernoulli trial, however?

I am thinking of a situation where we have $iid$ data like $0,1,0,0,1,0,1,1,1,0$. Sure, it seems like that is Bernoulli$(0.5)$, but could it be beta-Bernoulli$(a,b)$ for some values of $a$ and $b$ that are parameters of a beta distribution?

(Really, I want to consider this in a logistic-ish regression where the outcome is binary and the conditional distribution is modeled. With just one trial, though, I am not sure how the modeling would go. Ultimately, there is a probability of an event happening or not and then the event happens with that probability...or does it?)

EDIT

Simulating in R, it sure seems like there is not a difference.

library(ggplot2)
set.seed(2023)
N <- 10000
R <- 10000
a <- 1/3
b <- 1

p_bernoulli <- p_betabernoulli <- rep(NA, R) for (i in 1:R){

p <- rbeta(N, a, b) p_betabernoulli[i] <- mean(rbinom(N, 1, p)) p_bernoulli[i] <- mean(rbinom(N, 1, a/(a + b)))

if (i %% 75 == 0 | i < 5){ print(paste( i/R*100, "% Complete", sep = "" )) } } d0 <- data.frame( p = p_bernoulli, Distribution = "Bernoulli" ) d1 <- data.frame( p = p_betabernoulli, Distribution = "Beta-Bernoulli" ) d <- rbind(d0, d1) ggplot(d, aes(x = p, fill = Distribution)) + geom_density(alpha = 0.25)

KDE of probability

Dave
  • 62,186
  • 5
    Short answer: no. The only possible distributions on a binary set like ${0,1}$ are determined by the probabilities of $0$ and $1$ which, because they must sum to unity, are parameterized by one of those probabilities: and that gives the Bernoulli distribution. – whuber Mar 30 '23 at 20:28
  • @whuber So my idea for a beta-Bernoulli logistic-style regression is not fruitful the way that a beta-binomial regression with multiple trials would be? That would both disappointing and relieving. // Would similar logic apply for a "Dirichlet-multinomial" on one trial? – Dave Mar 30 '23 at 20:30
  • 2
    Considering a sequence of binary random variables is revealing of a useful distinction. The sequence of r.v.s could be non-independent, or have non-identical probabilities $p$. Bernoulli and binomial distributions, by construction, assume fixed $p$ and independent trials. Beta-* characterizes non-identical $p$ drawn from a fixed beta distribution, but does not admit non-independent trials. – Sycorax Mar 30 '23 at 20:36
  • @Sycorax And if that sequence is $iid$ because it is the same beta-binomial$(1, 1, 1)$ each time? – Dave Mar 30 '23 at 20:37
  • I don't understand your comment. – Sycorax Mar 30 '23 at 20:48
  • 2
    IN a sequence of 0/1 trials, you'd need either dependence, or the p-parameter to vary, in either case you're no longer talking about "a" distribution but rather a marginal (presumably) distribution across trials. – Glen_b Mar 30 '23 at 20:53
  • @Sycorax $X_1,\dots,X_n\overset{iid}{\sim}\text{beta-binomial}(1, a,b)$ for $a$ and $b$ as the parameters of a beta distribution – Dave Mar 30 '23 at 22:27
  • If you are interested in departures from Bernoulli then you might like to think about the outcomes of actual coin tossing, as opposed to theoretical coin tossing. The probability of heads is affected to a variable degree by which face of the coin is upwards, and some tossers can bias the probability of heads quite a bit. Plain Bernoulli doesn't really apply except as a convenient approximation. – Michael Lew Jan 31 '24 at 20:38
  • @MichaelLew Care to expand on that in an answer? – Dave Jan 31 '24 at 20:44
  • @Dave No, I won't do that, but it has already been done here on CV. https://stats.stackexchange.com/questions/574222/what-physical-conditions-give-rise-to-fairness-of-a-coin-toss-in-statistics – Michael Lew Jan 31 '24 at 20:47

1 Answers1

1

I found another simulation to be suggestive.

set.seed(2024)
N <- 100000
a <- 1
b <- 1
mu <- a/(a + b) # Mean of a beta distribution
x_binomial <- x_betabinomial <- rep(NA, N)

Simulate N-manydraws from the Bernoulli and Beta-Bernoulli distributions,

with the Beta-Bernoulli having its probability parameter changing according

to random draws from Beta(a, b) but with the same expected value as the

probability parameter (mu) of the Bernoulli

for (i in 1:N){

Simulate one draw from a binomial distribution on one coin flip

x_binomial[i] <- rbinom(1, 1, mu)

Simulate one draw from a beta-binomial distribution on one coin flip

First find the probability parameter from a beta distribution

p <- rbeta(1, a, b)

Draw from the Bernoulli distribution with that probability parameter

x_betabinomial[i] <- rbinom(1, 1, p)

}

Test if the two distributions are different

T-test

t.test(x_binomial, x_betabinomial) # I get p = 0.3122 prop.test(rbind(table(x_binomial), table(x_betabinomial))) # I get p = 0.3143

This isn't exhaustive, but I get that the distributions of $0$s and $1$s are basically the same. Therefore, whether we approach the distribution as a fixed-parameter Bernoulli or a Beta-Bernoulli, the distributions are the same.

To show this in general, let's look at the PMFs. Let $X\sim\text{Bernoulli}(p)$ and $Y\sim\text{BetaBernoulli}(a, b)$, and let $p = \frac{a}{a + b}$, so the mean of the underlying Beta distribution to the Beta-Bernoulli.

(By a beta-bernoulli distribution, I mean a beta-binomial distribution on one flip of the coin whose probability comes from a draw from some beta distribution, so $\text{BetaBinomial}(1, a, b)$.)

$$ P_X(x) = \begin{cases} 1 - p & \text{if}\space\space x = 0\\ p & \text{if}\space\space x = 1 \end{cases} \\ P_Y(x) = {1\choose x}\dfrac{ B(x + a, 1 - x + b) }{ B(a, b) } $$

$$ P_Y(0) = {1\choose 0}\dfrac{ B(0 + a, 1 - 0 + b) }{ B(a, b) } = \dfrac{ B(a, 1 + b) }{ B(a, b) } \\ P_Y(1) = {1\choose 1}\dfrac{ B(1 + a, 1 - 1 + b) }{ B(a, b) } = \dfrac{ B(1 + a, b) }{ B(a, b) } $$

A property of the beta function $B$ is that $B(x, 1 + y) = B(x, y)\dfrac{y}{x + y}$, so

$$ P_Y(0) = \dfrac{B(a, 1 + b)}{B(a, b)} = \dfrac{b}{a + b} = 1 - p = P_X(0) $$

Another property of $B$ is that $B(x + 1, y) = B(x, y)\dfrac{x}{x + y}$, so: $$P_Y(1) = \dfrac{B(a + 1, b)}{B(a, b)} = \dfrac{a}{a + b} = p = P_X(1)$$

Overall, we get:

$$ P_Y(x) = \begin{cases} 1 - p & \text{if}\space\space x = 0\\ p & \text{if}\space\space x = 1 \end{cases} $$

Thus, if we view the distribution as beta-bernoulli with some underlying $\text{Beta}(a, b)$, we wind up with the same distribution as if we started with a Bernoulli distribution with probability parameter $\frac{a}{a + b}$, the expected value of that underlying $\text{Beta}(a, b)$. The two viewpoints lead to the same distribution.

True, this only addresses the beta-Bernoulli possibility, but it leaves me pessimistic about any hierarchy (Bernoulli parameter has a distribution on $[0, 1]$) yielding a different distribution. It seems like the proportion of one value totally defines a binary variable.

Dave
  • 62,186
  • 1
    Simple example. Suppose a coin always throws head. Another always throws tails. The p values to throw head are extremely different, 1 and 0. You randomly (50/50) choose one of these coins, and throw. In total, the proportions 0 and 1 which you observe are both 0.5. So the same as one single unbiased coin with p=0.5. – BenP Mar 15 '24 at 07:34