I'm following this example (see below for the source):
| Breast Cancer | No Breast Cancer | Total | |
|---|---|---|---|
| DDT exposed | 360 | 13636 | 13996 |
| Unexposed | 1079 | 76313 | 77392 |
RR = 1.87
CI = 1.66 - 2.10
What I do not like is this: if I multiply the number of "No Breast Cancer" by a large number, even one million or more, these two numbers do not change at all. But now the occurrence of the event is very rare and the difference from 360 to 1079 may well be due to random chance.
I'm surprised that the ratio between the positive samples and the whole population is not even considered when we compute the confidence interval.
Intuitively 360 out of 13636 is different from 360 out of 136,360,000 and I would expect the confidence in the measurement to reflect this by being much lower.
Am I missing something? Thanks.
EDIT: I'll try to explain it with an example:
First experiment. My sample is 1000, half of this is young, half is old. I do a random split, the worst case is to have a "shift" of 250 units from the average. Now I run my tests and I get 5 cases in group A and 25 in group B.
Second experiment, completely different and independent from the previous one. My sample is one million, half young, half old. I do a random split. I think the average "shift" I can expect from the split is very high, I think it is possible to compute it, probably higher than 250. Now I run my tests and I get 5 cases in group A and 25 in group B, just like before.
Isn't the "signal" in the second experiment too weak to conclude something with the same "confidence" of the first one?
Source:
Please note that the numbers are slightly different from the ones in the table in the original example but match the one used in the formula.