4

Running the following Python code, I often get very small p-values, sometimes even around 0.01.

a, b = np.random.normal(0,1,100000), np.random.normal(0,1,100000)
ttest_ind(a,b).pvalue

As the mean and std are the same and my sample size is pretty large, I'd expect getting p-values that are far away from zero.

Here is a histogram of the p-values I'm getting: p-values-hist

Why does this happen?

  • 1
    That's a seriously wrong histogram if in fact it represents a large number p-values from tests of the null distribution! It should look nearly flat and horizontal: see https://stats.stackexchange.com/search?q=p-value+uniform. How many tests does it reflect? – whuber Feb 16 '19 at 22:35
  • 3
    It looks pretty uniform to me -- just with a lot of bins relative to the number of observations; looks like 100 bins, so with the average looking close to 10, presumably the count in each bin would be binomial(1000,0.01) (which I admit is odd given the larger number the code indicates). The histogram looks reasonably consistent with that binomial (perhaps also with some allowance for some program-choice of bin origin and width based on the sample). – Glen_b Feb 16 '19 at 23:41
  • 1
    @Michael If H0 is true, what is the probability that $p\leq\alpha$ for any $\alpha$? (i.e. what is the probability of a type I error?). As whuber indicates many posts on site discuss this issue. – Glen_b Feb 16 '19 at 23:50

1 Answers1

6

I used R with your setup and generated 300 p-values from a t-test under the same setting you used. Here is the histogram of the p-values:

enter image description here

Here is the quantile-quantile plot which depicts the quantiles of the p-value distribution against uniform quantiles:

enter image description here

As expected, the distribution of the p-values looks uniform.

Finally, here are the p-values plotted against their index:

enter image description here

As you can see in this last plot, the p-values can be pretty much anything in the range 0 to 1.

In fact, as explained by Geoff Cumming in his fabulous video Dance of the p-values, when you replicate a study under similar conditions many times, you can't really use the p-value from the current replication to tell you something about the expected magnitude of the p-value from the next replication because the p-value from the current replication is simply not very informative that way - it gives extremely vague information about the
p-value from the next replication.

Towards the end of the video, Geoff Cumming lists 80% prediction intervals where you can expect the p-value from the next replication to be found when you know the p-value from the current replication. In particular:

P-value from current replication          80% Prediction interval for p-value for next replication

        0.05                                         0.00008 to 0.44

The video goes into more depth so watching it is worthwhile: https://youtu.be/5OL1RqHrZQ8.

If you wanted to see the R code I used, here it is:

set.seed(101)

p.value <- NULL

for (i in 1:300){

   a <- rnorm(100000, 0, 1)

   b <- rnorm(100000, 0, 1)

   t <- t.test(a,b, var.equal=TRUE)

   p.value <- c(p.value, t$p.value)

}

require(MASS)

truehist(p.value)

require(car)

qqPlot(p.value, distribution = "unif")

plot(p.value, type="h", col="dodgerblue")

If you use larger bin widths for your histogram, you should get a nicer looking histogram. The bin width you are currently using seems far too small.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Isabella Ghement
  • 20,314
  • 2
  • 34
  • 58