1

Say I have a population of 5,000 objects. Each object has a length M > 100 feature vector (real numbers). The 5,000 objects can be split into 3 group: A,B,C. These groups are disjoint, non-trivial (> 100 objects) subsets of the total population of 5,000 objects. where size(A) + size(B) + size(C) = 5,000.

Consider comparing each component of the M-length feature vector for A vs. B This results in M p-values. I account for multiple comparisons by applying FDR, as usual.

However, a priori, I am interested in all comparisons. That is: for each pairwise grouping A vs. B, B vs C, and A vs C, I want test if each component of the M-length feature vector is different.

Question 1:

Should I perform FDR separately (i.e. FDR on the p-values of A vs. B, FDR on the p-values of B vs. C, FDR on the p-values of A vs. C) , or should I pool all of the p-values together (a length 3M vector) and then perform FDR?

Question 2:

Given the theoretical basis/motivation of the multiple comparisons problem. I believe I need to pool all the p-values together. Are the pairwise FDR computations still valid, regardless? For each pairwise consideration AvB, BvC, AvB also fulfills the FDR setting.

However, within A vs. B, the M hypothesis tests are mutually independent. But when I pool all the p-values together, the 3M hypothesis tests are NOT independent (since A + B + C = the whole population). This seems to reject the notion that pairwise FDR values and "pooled" FDR values are both legitimate?

cmo
  • 334
  • 3
  • 13

0 Answers0