interpreting Q-Q plot of a GWAS

Question

In the attached q-q plot obtained from a genome-wide case/control study, the negative log of the p-values are plotted on the x (observed) and y (expected) axis.

Few SNPs at the extreme distribution deviates from the expected null distribution. But, since the observed p-values for these SNPs in the extreme distribution is HIGHER (i.e. less significant) than the expected p-values, how should this be interpreted? as opposed to a situation if the observed p-values were more significant compared to the expected values?

Thanks

Is there any possibility that the $p$-values are being restricted by the characteristics of the test you are using to generate them? For instance some non-parametric tests can only take on a finite number of values. — mdewey, Jun 30 '17 at 12:32
Hi, thank you for your comments. I dont think so. The p-values were obtained from a logistic regression analysis in plink. — Sam, Jun 30 '17 at 12:55
There is something wrong with the data. This can commonly stem from misread files. — Anna Docherty, Feb 02 '21 at 01:02
Looking at the data for the smallest p-value would be informative, but I think @mdewey is probably right. For logistic regression, complete separation and premature convergence can both give lower limits on p-values. — Thomas Lumley, Feb 02 '21 at 02:02

score 0 · Answer 1 · answered Nov 13 '23 at 21:39

All points on the dotted line represent data (p-values in this case) that occur as frequently as you'd expect by chance given the null hypothesis. Off-diagonal data represents p-values that are

lower than expected if the point lies above the line.
higher than expected if the point lies below the line.

In the context of GWAS/SNPs, points above the line are SNPs and points below the line mean your p-values are being poorly calibrated. This is usually from using misspecified models (especially mixed models).

(I'm very late I know)

interpreting Q-Q plot of a GWAS

1 Answers1