I'm perplexed by the comment of a reviewer, who has taken issue with the sample size from a study I conducted. He says, relatively to the 2X2 contingency table below and a (significant) 2-sample test for equality of proportions:
9 and 19 are too low, all cells should have 30 observations minimum, no statistical test will fix that.
Here is the contingency table in question:
| Gender | characteristic X | absence of characteristic X |
|---|---|---|
| Men | 19 | 194 |
| Women | 9 | 251 |
The hypothesis here was that there is a difference of proportions between men and women relatively to the characteristic of interest (i.e. is the % of women with characteristic X different from the % of men with this characteristic?). If it's in any way relevant, the population studied here is of about 25,000 individuals.
A two-sided test of proportion (in R, prop.test(c(9,19), c(260, 213))) gives a p-value of 0.02105. (As a side note, a Fisher test gives a p-value = 0.01751.)
In my discipline and in the journal where I intend to publish, p-values < 0.05 are usually deemed as significant; the reviewer doesn't seem to take issue with this convention, so at first glance it is not a problem relative to significance thresholds and their relevance.
His point is not about model misspecification either, as the rationale for using only these two variables (gender/outcome) is given in the paper and he did not object to it.
His point seems to be that any given sample should follow this rule of "30 observed counts minimum by cell" in order to infer anything reliably.
I never read such a requirement before, and it is something I don't quite understand: in this case, even with a huge sample size, you may fail to satisfy this rule if an event is very rare in one of the groups you compare.
Now I still try to make sense of the reviewer's comment, in order to answer it or to understand a potential mistake I made here. It was my understanding that the sample size has no or limited impact on type-1 errors (see Can a small sample size cause type 1 error?). The reviewer insists, and makes me wonder if I'm somehow mistaken, if I misread something, if I conducted an incorrect test, or something else.
So, basically, I'm looking for a second opinion. Is there something incorrect in my analysis here? If not, do you know some reliable academic references that I can use to reinforce my answer to his comment? And if I'm mistaken, I'd be of course more than interested by an explanation.
Thanks,