Questions tagged [false-discovery-rate]

An expected fraction of rejected null hypotheses that are falsely rejected, i.e. the fraction of significant findings that are actually not true. One method to control FDR in multiple testing is Benjamini-Hochberg procedure.

The False Discovery Rate (abbreviated as FDR) is an expected fraction of rejected null hypotheses that are falsely rejected, i.e., the fraction of "significant" findings that are actually not true. These are called "false discoveries". Given $V$ false discoveries and $R$ total rejected hypotheses, FDR can more formally be defined as

$$ FDR = E\left[\frac{V}{R}\right] $$

Controlling the false discovery rate has become a popular method for dealing with the multiple comparisons problem, and has seen wide acceptance in a variety of fields.

Benjamini and Hochberg were the first to introduce this method in 1995 [1]. Their method works as follows:

For a given $\alpha$, find the largest number of tests, $k$, out of all tests conducted, $m$, such that $P_k \leq \frac{k}{m}\alpha$ and subsequently reject all hypothesis $H_i$ for $i = 1 ... k$.

It was later shown by Benjamini and Yekutieli that the above mentioned method is robust to several dependency conditions, and specifically to a subset known as positive regression dependency. They also extended the method to include different kinds of dependency [2].

There have been several modifications and extensions to the FDR method proposed by Benjamini and Hochberg, including notably:

  1. The $q$-value extension by John D. Storey implemented in the qvalue R package available on Bioconductor and Github [3,4]. See also this web Shiny implementation of the qvalue R package [10].
  2. Local false discovery rates, implemented in the R package fdrtools on CRAN [5,6].
  3. Stratified FDR (sFDR) as implemented in Lei Sun's Perl script SFDR [7,8].

The original procedure (sometimes known as the BH procedure) is available as a default method in many software packages, and is an option in the p.adjust(p, method = "BH") function in R. The extended work of Benjamini, Hochberg, and Yekutieli is available through p.adjust(p, method = "BY") [9].

References and Further Reading

[1] Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statisitical Society, Series B, 57(1), 289–300.

[2] Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, 1165–1188. http://doi.org/10.1214/aos/1013699998

[3] Storey, J. D. (1995). A direct approach to false discovery rates. J. R. Statist.Soc. B.

[4] https://github.com/jdstorey/qvalue

[5] Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151-1160.

[6] https://cran.r-project.org/web/packages/fdrtool/fdrtool.pdf

[7] Sun, L., Craiu, R. V., Paterson, A. D., & Bull, S. B. (2006). Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology, 30(6), 519–530. http://doi.org/10.1002/gepi.20164

[8] http://www.utstat.toronto.edu/sun/Software/SFDR/

[9] https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html

[10] http://qvalue.princeton.edu/

280 questions
12
votes
3 answers

How do FDR procedures estimate a False Discovery Rate without a model of base rates?

Can someone explain how FDR procedures are able to estimate an FDR without a model / assumption of the base rate of true positives?
user4733
  • 2,714
8
votes
1 answer

FDR correction when tests are correlated

I have a data set with small number of samples and large number of variables. I did hypothesis testing (T test) on each of the variable and got a number of p values. However, the variables are correlated to each other and FDR correction…
WCMC
  • 1,058
7
votes
1 answer

Understanding the Benjamini-Hochberg method proof

I'm trying to understand the proof in Benjamini & Hochberg's 1995 paper, specifically the Lemma in the appendix, as the rest of the proof is short and follows it. I got stuck somewhere after equation (5)—where it says: "Thus all $m_0+j_0$ hypotheses…
Maverick Meerkat
  • 3,403
  • 27
  • 38
5
votes
1 answer

Plain language definition of positive regression dependency on each one from a subset (PRDS)

Existing questions have discussed the concept of PRDS in the context of Benjamini and Hochberg's false discovery rate correction (FDR-BH). However, I think the definitions of PRDS in these (and in Benjamini and Yekutieli, 2001) are not intuitive for…
Joni
  • 138
5
votes
2 answers

What is the difference between the FDR in benjamini-hochberg/bonferonni vs a local FDR?

I am wondering if the local FDR in Efron's literature is different than the FDR associated with Benjamini-Hochberg and if it is perhaps talking about something else.
user321627
  • 4,474
5
votes
1 answer

What is different about the q-value and local FDR?

What is different about the q-value and local FDR when both are defined as posterior probabilities of the null being true? For example in Storey (2010), Under these modeling assumptions, it follows that q-value(pi) = mint≥pi Pr(Hi = 0|Pi ≤ t),…
BKV
  • 410
4
votes
0 answers

Multiple testing correction alternatives

I have a data set with about 6,000 Fishers exact tests. If I want to do a multiple testing correction to control the FWER what are the advantages of one of these methods vs the other calculating a q-value for each test Cite Using a…
4
votes
1 answer

Multiple testing and FDR on multiple-pairs

Suppose, I have three time-points, Ta, Tb and Tc. Let Ta be control and Tb and Tc be the effect of a drug 4 hours and 8 hours after treatment. For each pair of time-points, I compare about 15000 observations (genes, for differential expression, to…
Arun
  • 844
2
votes
1 answer

Estimate the false discovery proportion for a fixed number of discoveries

Suppose we are testing a large number ($p$) hypothesis tests. Most commonly, a correction is applied to control the false discovery rate (FDR) at a low value, say $\alpha$. But now imagine a want to report the top 10 features with the largest test…
Knarpie
  • 1,648
2
votes
0 answers

Correct to use FDR q-values for a small number of variables?

I would like to correct for multiple testing and I was considering using FDR q-values. In the original paper by Storey, they used this approach for genomewide data, where there are several thousands of variables. However, my dataset is smaller and…
Alex
  • 181
2
votes
0 answers

Confused by Benjamini-Hochberg-like false detection rate calculation in paper

I'm trying to figure out where the false detection rate correction in a genetics paper is coming from and failing. The paper in question is looking at a sequencing method where every sample is read multiple times giving, each time, either a positive…
1
vote
1 answer

Estimating effect after Benjamini-Hochberg procedure

I run an AB test and I have results at city level (keep in mind I run a one-sided test): city,diff_means,p_value,ci_lower,ci_upper TGN, 0.1, 0.001, 0.09, 0.11 BCN, 0.0, 0.78, -1,1 REU, -0.1, 0.99, -0.11, …
1
vote
0 answers

Calculating FDR in batches while taking into account the total number of tests

I am running a large number (potentially hundreds of millions - billions) null hypothesis significance tests (specifically a Poisson test, but the question is general). I would like to apply to them a correction for multiple hypothesis tests (e.g.…
user1614062
  • 121
  • 1
1
vote
1 answer

Empirical FDR from real and null distributions -- do sample sizes need to match?

Suppose I have a set of values (correlation, in this case) that come from a real distribution, and another set of values that are from a simulated null distribution. To calculate the value that corresponds to FDR = 0.05, I've followed what…
1
vote
0 answers

Problem in understanding False Discovery Rate (FDR)

I am facing a problem understanding the concept of FDR or False Discovery Rate in Multiple Hypothesis Testing. May be the question is silly, please try to bear with me. Actually, when we are talking about P-Value for a single hypothesis test, a…
1
2