Does sample size affect the distribution of p-values?

Question

I'm conducting a simulation study where I repeatedly sample heights from a normal distribution with a mean of 175 cm and standard deviation of 6 cm. I then calculate a one-sample t-test p-value for each sample against a null hypothesis mean of -175 cm. My sample sizes are quite large (200 observations per sample).

Theoretically, I expected the p-values to follow a unimodal distribution since i thought comparing it to a small sample that have a uniform distribution,so we'll have more population and the mean will be more accurate and close to 175. Which will make the p-value closer to range 1. (skewed to the left). but I was wondering if a larger sample size might change this distribution. Specifically, I used this code:

iterations = 5000
p_values = np.empty(iterations)
for i in range(iterations):
    sample_heights = np.random.normal(175, 6, 200)
    p_values[i] = p_value_calculator(sample_heights, 200, 175)
plt.hist(p_values, bins=30, edgecolor='black')
plt.xlabel('p-value')
plt.ylabel('Count')
plt.title('Distribution of p-values')

Despite having a large sample size, I noticed that the distribution of p-values appears uniform, which was initially surprising to me. Does sample size affect the distribution of p-values or is it always uniform when the null hypothesis is true?"

https://stats.stackexchange.com/questions/376772 is also germane. For more, search our site for posts about uniform p-value distributions. — whuber, Jun 08 '23 at 16:36

score 5 · Accepted Answer · edited Jun 08 '23 at 17:56

As others have said, p-values always have a uniform distribution when the null hypothesis is true. I wanted to address your intuition that since sample means will be closer to the population mean because of the larger sample size, you would expect p-values to be closer to 1 and therefore unimodal. Although it's true that sample means will be closer to the population mean, remember that it's the z-statistic, which is a function of the sample mean and the standard error (i.e., $\frac{\bar{x} - \mu}{\sigma/\sqrt{n}}$), that determines the p-value. Although the sample means will be closer to the population mean with a larger sample, the standard error will shrink accordingly so that the z-statistic maintains a normal distribution and the p-value maintains a uniform distribution.

score 4 · Answer 2 · answered Jun 08 '23 at 15:47

4

When the null hypothesis is true, which it is in your code, p-values should be distributed as $U(0,1)$. This way, p-values mean what they claim to mean, in that the probability of getting a p-value of $0.05$ or lower is actually $0.05$.

answered Jun 08 '23 at 15:47

Dave

62,186

score 4 · Answer 3 · answered Jun 08 '23 at 15:48

If your data follows the null hypothesis, the p value is uniformly distributed on the unit interval, by the definition of the p value. Thus, the sample size has no impact on the distribution.

If your data does not follow the null hypothesis, the p value follows a different distribution, which will typically become more and more extreme as you collect more data, so you indeed have a dependence on the sample size in this case.

Does sample size affect the distribution of p-values?

3 Answers3