2

I'm trying to calculate the standard deviation of a series of binary outcomes and would like confirmation that my computations of SDEV are correct or not (as explained further below my outcomes shown below in yellow in the illustration don't make sense to me thus I ask for help). I am calculating the rate that a population of elements reaches a state of "C" over time. This is a binary outcome: either an element reaches state C or it doesn't.

Columns F and K, with formulas exposed to their immediate rights respectively, show my SDEV calculations. Column F calculates SDEV for each period independently (based on elements reaching state C in each period per Column B), and column K calculates SDEV on a cumulative basis (based on elements cumulatively reaching state C per Column C).

Where things don't make sense is shown in rows 16-20 below: expanding the population by a factor of 10, but keeping the rates of reaching state of C the same, results in much higher SDEV (columns F and K, rows 16-20). Clearly I am doing something wrong.

Ultimately, what I will be trying to do here, is run simulations to derive a distribution of possible cumulative C percentages in Period X. Thus I need to make sure I get SDEV correct. The actual population of elements I am working with is 68,000.

enter image description here

1 Answers1

1

You are confusing Bernoulli distribution with binomial distribution. Bernoulli distribution is a distribution for binary events, where the probability of "success" is $p$ and the standard deviation is $\sqrt{p(1-p)}$. The binomial distribution is about running $n$ Bernoulli trials with the same probability $p$ (e.g. you are tossing a coin $n$ times), it is a distribution for the total number of successes. The second distribution has a standard deviation equal to $\sqrt{np(1-p)}$.

You calculated the standard deviation for the binomial distribution, i.e. for the number of successes, so it grows with $n$. Maybe wanted to calculate the standard error for the mean of binomial distribution ($p$)?

So your calculations are correct, but you are either misinterpreting them, or you wanted to calculate the standard error while calculating standard deviation.

Tim
  • 138,066
  • Thank you for the detailed explanation! I learned important points from this. We'll just say it's been a very long time since my last statistics class and I need to polish up some. – Curious Jorge - user9788072 Nov 24 '22 at 13:14
  • 2
    I think your description of "divide by sqrt(n) afterwards" is incorrect, or at least ambiguous. The standard error of the mean of the binomial distribution is sqrt(p(1-p)/n). So to get that from sqrt(np(1-p)) we need to divide by n, not sqrt(n). Right? – David Thiessen Nov 26 '22 at 14:35
  • 1
    @DavidLukeThiessen you are right, the wording was confusing. I removed it as the link alone suffices. – Tim Nov 27 '22 at 16:09