3

In the attached q-q plot obtained from a genome-wide case/control study, the negative log of the p-values are plotted on the x (observed) and y (expected) axis.

Few SNPs at the extreme distribution deviates from the expected null distribution. But, since the observed p-values for these SNPs in the extreme distribution is HIGHER (i.e. less significant) than the expected p-values, how should this be interpreted? as opposed to a situation if the observed p-values were more significant compared to the expected values?

Thanks

enter image description here

Sam
  • 31
  • 1
  • 2
  • 2
    Is there any possibility that the $p$-values are being restricted by the characteristics of the test you are using to generate them? For instance some non-parametric tests can only take on a finite number of values. – mdewey Jun 30 '17 at 12:32
  • Hi, thank you for your comments. I dont think so. The p-values were obtained from a logistic regression analysis in plink. – Sam Jun 30 '17 at 12:55
  • There is something wrong with the data. This can commonly stem from misread files. – Anna Docherty Feb 02 '21 at 01:02
  • 1
    Looking at the data for the smallest p-value would be informative, but I think @mdewey is probably right. For logistic regression, complete separation and premature convergence can both give lower limits on p-values. – Thomas Lumley Feb 02 '21 at 02:02

1 Answers1

0

All points on the dotted line represent data (p-values in this case) that occur as frequently as you'd expect by chance given the null hypothesis. Off-diagonal data represents p-values that are

  • lower than expected if the point lies above the line.
  • higher than expected if the point lies below the line.

In the context of GWAS/SNPs, points above the line are SNPs and points below the line mean your p-values are being poorly calibrated. This is usually from using misspecified models (especially mixed models).

(I'm very late I know)

SARose
  • 305
  • 1
  • 3
  • 8