2

Let's say I have an API, and customers are calling my API. I want to model this as a Binomial distribution because either the call can fail or succeed.

To use a binomial distribution, the API calls need to be independent events. I assume these are not independent events because if the server goes down, then all API calls will fail. (One failure makes it more likely the next one will fail too).

Am I right?

1 Answers1

3

TL;DR: In general, users are dependent due to server functionality. However, it may be more tenable to assume users are conditionally independent conditioned on a functioning server (and in the absence of any other information, e.g. two users are working at the same company and they have been locked out for some reason).

Here is one way to think about the problem. Let's say we have two users, A and B, who truly are independent of one another when the server is functioning. This means that when the server is functioning, the probability of A and B making successful calls is the product of the each of them making successful calls. Let's further assume that the API calls do not affect the server, the server can only affect API calls.

Let $A$ be the event A makes a successful call, $B$ be the event B makes a successful call, and $S$ be the event the server is functioning. This problem has a joint density

$$P(A, B, S)$$

A and B are independent if $P(A,B) = P(A)P(B)$. That is, the joint density is the product of their marginals. Now, when the server is working, this is true by assumption

$$P(A, B, S=1) = P(A\mid S=1)P(B\mid S=1)P(S=1)$$

But when the server is not working, neither can make a successful call, so

$$P(A, B, S=0) = 0$$

So that means the joint distribution is

$$ \begin{align} P(A, B) &= P(A, B\mid S=1)P(S=1) + P(A, B\mid S=0)P(S=0) \\ &= P(A, B\mid S=1)P(S=1) + 0\cdot P(S=0) \\ &= P(A, B\mid S=1)P(S=1) \end{align}$$

So the joint density does not factor into the product of marginals, hence A and B are dependent (here on S). However, they are conditionally independent. Conditioned on what? when the server is working. So any analysis you do which requires and independence assumption would need to be done when the server works.