Grouping trials decreases standard error?

Question

My apologies is this is too rudimentary to be asked here. If it does not belong here, could someone recommend a more appropriate place to ask?

A little context. I am in a Senior Physics Lab class in college. Our first "experiment" involved rolling dice and calculating the probability of rolling each face. (The purpose is purely for practice in applying and understand statistical analysis.) I chose to roll two dice separately with the intention of testing whether the two dice are identical. I know now I should have taken more samples, but I am not able to collect more data at this time.

I originally recorded my data in trials of 20 rolls, and I took 15 trials for each die. I then calculated the counts of each face within each trial and divided by 20 to obtain an estimate of the probability in each trial. I then averaged these probabilities over all the trials and calculated the standard error (calculated as the standard deviation of the sample). However, this produced very large standard error, from about 40% to 60% for each face. Here is the data for Die 1.

Face  Avg. Probability  Error  Percent Error
1     0.13              0.08   59.6
2     0.20              0.08   39.0
3     0.18              0.09   47.9
4     0.13              0.08   64.8
5     0.18              0.10   57.1
6     0.18              0.10   55.4

However, I discovered that by grouping the trials into larger sets, the error decreased. That is, I grouped trials 1 through 5, 6 through 10, and 11 through 15, and I calculated the probability of obtaining each face as the count of the face within the "grouped trial" divided by 100. This produced the following values for Die 1:

Face  Avg. Probability  Error  Percent Error
1     0.13              0.03   26.6
2     0.20              0.01   5.0
3     0.18              0.04   20.0
4     0.13              0.06   43.5
5     0.18              0.06   32.9
6     0.18              0.02   11.1

Have I definitely done something wrong, or is the change in the averages and the decrease in error explainable? Would I be doing something to invalidate my results by using the larger "group trial" samples? I can provide my original data in CSV format if desired, and of course, if I am saying things that make no sense, I am open to correction.

Edit:

I've found the mistake in my calculations with the groups of 20 rolls, and the means are now identical as expected. However, this hasn't resolved the question of why the error is so much larger when I group my data into a greater number of samples, which was really the question I meant to ask.

Yes, I understand that I could simply consider this problem using other means, but I'm still wondering why the error/standard deviation decreases. I think that question applies more generally than to this specific probability question. Or is the approach of trying to estimate the probability this way entirely invalid?

Sorry I'm not clear on what you mean by a 'trial'. Did you roll each die 300 times? — Peter Ellis, Feb 04 '13 at 09:24
I don't think the averages should change - could it be from rounding error? The decrease in "error" is expected. I believe you are estimating $\sqrt{p(1-p)/n}$ with $p=1/6$, first for $n=20$ and then for $n=100$. — mark999, Feb 04 '13 at 09:26
You are certainly doing something wrong but it's not clear what. I started an answer but it got too complicated... Basically any arbitrary grouping of 'trials' or anything else should not matter. You probably are wrongly changing your sample size for purposes of estimating standard error. — Peter Ellis, Feb 04 '13 at 09:42
@PeterEllis Yes. I rolled each die 300 times. I will look into my calculations tonight for mistakes. I'm doing these calculations in PostgreSQL, so it's entirely possible I messed up my queries somewhere. Thank you very much for your help. — jpmc26, Feb 04 '13 at 14:09
@mark999 I think that's correct, although I'm a bit unclear on where the $\sqrt {p (p-1)/n}$ is coming from. Sorry. My statistics knowledge and skills are quite rudimentary. Could you elaborate on why the standard deviation/error is expected to decrease? I doubt it's rounding error. I calculated this using PostgreSQL's NUMERIC data type, which keeps a very large number of decimal places. I explicitly rounded the values to 2 decimal places at the end. Thank you very much for your help. — jpmc26, Feb 04 '13 at 14:09
The reason I suggested the possibility of rounding error is that the "average probabilities" should be the same for both groupings. They should be the same as the estimate that you get if you just take the total number and divide by 300. — mark999, Feb 04 '13 at 19:28
The sum of the probabilities for the sets of 20 adds up to 1.03. I'm still trying to figure out what I did wrong. — jpmc26, Feb 05 '13 at 08:48
Well, I finally figured out what the heck was going on. I'm using a database, and I was using its COUNT function to count the dice. The problem is that some of the faces didn't show up in some trials, so those counts were 0. Because it didn't find any to count, the database just left those rows out entirely. So when it computed the average, it was only doing it with 14 probabilities or some other number less than 15 and ignoring the 0s in some trials. Tricksy. But when I put them in larger groups, there were no 0s, so those calculations came out right. Thanks for the help, everyone. — jpmc26, Feb 06 '13 at 05:18
Are you familiar with the terms "Bernoulli random variable" or "binomial random variable"? — mark999, Feb 06 '13 at 06:35
I just looked them up; I believe I was exposed to them in a rudimentary statistics course I took a couple years ago. I hesitate to say I'm actually familiar with them since I don't recall many details, although I do vaguely recall they have their own formulae for expected value and variance. Part of the reason I still wanted to pose the question about the standard deviations is because I wanted to know if this sort of "grouping" issue can arise in other kinds of experiments, where using mean and standard deviation might be more appropriate. Again, thank you both for all your help and patience. — jpmc26, Feb 06 '13 at 06:55

score 2 · Accepted Answer · answered Feb 06 '13 at 08:26

For each $i=1, 2, \ldots$, define the random variable $X_i$ as follows: $X_i = 1$ if the $i$th roll of the die is a 1, and $X_i = 0$ otherwise. Suppose that for your die, the probability of rolling a 1 on any given roll is $p$. Then $P(X_i = 1) = p$ and $P(X_i = 0) = 1-p$. Because the rolls of the die are independent, the $X_i$ are independent random variables.

For each $i$, the expected value of $X_i$ is $E(X_i) = 1 \times p + 0 \times (1-p) = p$. The expected value of $X_i^2$ is $E(X_i^2) = 1^2 \times p + 0^2 \times (1-p) = p$. The variance of $X_i$ is $\text{Var}(X_i) = E(X_i^2) - E(X_i)^2 = p - p^2 = p(1-p)$.

If you roll your die $n$ times, you get $X_1, \ldots, X_n$ and you estimate $p$ by $$ \hat{p}_n = \frac{1}{n} \sum_{i=1}^n X_i. $$ If $Y$ is any random variable and $c$ is any number, then $\text{Var}(cY) = c^2 \text{Var}(Y)$, so \begin{align*} \text{Var}(\hat{p}_n) &= \text{Var} \left( \frac{1}{n} \sum_{i=1}^n X_i \right) \\ &= \frac{1}{n^2} \text{Var} \left( \sum_{i=1}^n X_i \right) \\ &= \frac{1}{n^2} \sum_{i=1}^n \text{Var}(X_i) \qquad{\text{because the $X_i$ are independent}} \\ &= \frac{np(1-p)}{n^2} \\ &= \frac{p(1-p)}{n}. \end{align*} So the standard deviation of $\hat{p}_n$ is $\sqrt{p(1-p)/n}$. This is what you're estimating by taking the standard deviation of your sample proportions. If the die is fair, then $p=1/6$, and you first have $n=20$ and then $n=100$.

I've explained this for the proportion of ones, and the same applies to the other numbers 2,3,4,5,6.

I will go over this more thoroughly later, but I believe this answers my question. Thank you! — jpmc26, Feb 06 '13 at 14:57

score 1 · Answer 2 · answered Feb 04 '13 at 18:34

The fact that you group your rolls into 15 trials of 20 rolls each is not relevant to estimating averages and standard errors. It is much more straightforward to treat this just as 300 rolls of each die.

The result of a n rolls of a die is a random variable with multinomial distribution. The variance of the number of times the i th face comes up is $np_i(1-p_i)$. You don't know the real value of $p_i$ but can estimate it from your data or just use $1/6$ - for these purposes you will get similar (not identical) results.

As you are interested in scale-free proportions, not the number of times out of 300, you need to divide the number of occurences by n (in this case 300); and to get the variance of that estimated proportion you divide by $n^2$ (because if you multiply a random variable by a constant like $1/n$, its variance increases by the constant squared). To convert the variance of that proportion into a standard error you take its square root, hence $\sqrt{\frac{p(1-p)}n}$ as suggested by mark999. It looks like you were using the wrong value of n.

I don't understand what you mean by "using the wrong value of n". As I understand it, he/she just took the sample standard deviation of the sample proportions. — mark999, Feb 04 '13 at 19:39
@mark999 You're correct. I took the mean and sample standard deviation of the sample proportions. I haven't been able to figure out what went wrong, yet. Thank you both very much for your help. — jpmc26, Feb 05 '13 at 05:03

IMA · Answer 3 · 2013-02-06T08:48:10.237

I am not sure I understand completely what you are doing but I believe this could be it. Mathematically has already been explained above so just to dumb it down to prosa:

You roll out a sample of 300, each roll associated with an outcome. Then, you can group those results as you like. You also have the expected values for a "fair" dice and compare it to those outcomes by calculating standard errors, right?

What you are actually doing is assuming that the dice is fair and then doing a regression like experiment: You add up all results from the experiment (or a segment thereof), calculate the frequency and take this as an estimator - which you know should be 1/6. However you have a random part in your model. Either your dice may not be fair, or it may not fall to its expected value "by chance". In either case, this "error" can be expected to be zero on average.
It is added to each result (which would be expected as 1/6 chance to be either side), making the results differ.

Well using a lot of statistical results which you will learn, one can see that if the amount of data points increase, this error term tends to diminish if it has certain properties (which are probably fulfilled in an experiment such as this). In short: The expected value of that error term is zero and with more data points the variance of the regression residuals decrease.

So if you segment your groupings, you have less information about this error term. Your "estimate" of the true parameter (which should be 1/6), is simply more influenced by the error. If you combine your data points, you have much more information in your sample and those random errors start to even out more and more.

In fact, if you would have an infinite sample, you'd end up with no errors at all in your estimates.

The point beeing that your estimators are probably unbiased. Their expected value (so the expected value of the estimators) are in fact 1/6 (mostly because you error is expected to be zero). However in reality you will be off. And this residual error decreases with more data points. Remember that you add your data points together which means that you estimators for each segments are not from the same data. Adding those together you have more information - or said differently the error points for each sample - which are centered around zero - even out, pinning your estimate closer to the actual probability of the fair dice you threw.

"If you combine your data points, you have much more information in your sample and those random errors start to even out more and more." I think this is the key point, and I think that holds even if the die is not fair. Thank you. — jpmc26, Feb 08 '13 at 03:33
Yes it does, in fact your die is probably not completely fair - as you realize. Your estimate, however, does improve as long as your estimation error is somewhat unsystematic (as I said this is the basic for statistical inference and if you understand this well, it will serve you in future studies). — IMA, Feb 12 '13 at 08:36

Grouping trials decreases standard error?

3 Answers3

Linked