I have p-values from several tests (k~50) that I'd like to combine. I don't have a model of their correlation structure. They aren't independent. In the worst-case scenario, my effective sample size might be more more like 15.
I'm dealing with marginally significant results (About 10% remain significant after a two-stage Benjamini Hochberg FDR correction for α=0.05). So, I need to be very careful to both avoid a false positive and to avoid wasting statistical power.
Option 1: Bonferroni argument
The Bonferroni correction is considered conservative when considering multiple tests, where any single positive test is taken as a positve result.
You usually divide your p-value threshold by $k$ for this, but it is equivalent to multiple the p-value by $k$. This is really a linear approximation when $p\ll 1$. The real quantity you want to use is $\tilde p = 1-(1-p)^k$, which gives you the probability of seeing at least one test pass under the null distribution.
By this argument, if I think that my effective sample size is no less than $k/3$, I could correct my p-values to $\tilde p = 1-(1-p)^3$, then apply Fisher's method. On my data, this gives something like $p_{combined}$ = 5×10^-3.
Option 2: Fisher's method argument
My understand is that Fisher's method: (1) converts the p-values to the sum-of-squares $\sum_{i=1}^k\|z_i\|^2$ quantity of a χ² distribution (2) sums these up, then (3) tests whether the result looks significant for a χ² with DoF = $2k$. On my data, this gives something absurd like 1×10^-11, which is certainly spurious.
My intuition is this: Say that my data secretly consisted of individual tests replicated three times each. Each of this would give a (possibly noisy) estimate of the same p-value.
I could hypothetically average these together into $k/3$ tests, then apply Fishers method, testing against a χ² with DoF = $2k/3$.
Instead of testing whether $\sum_{i=1}^k\|z_i\|^2$ looks significant with a χ² with $2k$ DoF, I coulld test whether $(\sum_{i=1}^k\|z_i\|^2)/3$ looks significant with a χ² with $2k/3$ DoF. On my data, this gives $p_{combined}$ = 1×10^-5.
Option 3: Preprocessing with Benjamini Hochberg
Say I trust the corrected p-values froma two-stage Benjamini Hochberg FDR correction (α=0.05). I think this is fine, since this FDR correction depends on the distribution of p-values, and doesn't require independence? (correct me if this is wrong).
I can run FDR correction first, then apply Fisher's method to combine the corrected p-values. On my data, this gives $p_{combined}$ = 2×10^-3.
Question: Are any of these methods considered kosher and, if so, does anyone know a reference to back them up? (Other solutions very welcome, of course).