1

I'm conducting a simulation study where I repeatedly sample heights from a normal distribution with a mean of 175 cm and standard deviation of 6 cm. I then calculate a one-sample t-test p-value for each sample against a null hypothesis mean of -175 cm. My sample sizes are quite large (200 observations per sample).

Theoretically, I expected the p-values to follow a unimodal distribution since i thought comparing it to a small sample that have a uniform distribution,so we'll have more population and the mean will be more accurate and close to 175. Which will make the p-value closer to range 1. (skewed to the left). but I was wondering if a larger sample size might change this distribution. Specifically, I used this code:

iterations = 5000
p_values = np.empty(iterations)

for i in range(iterations): sample_heights = np.random.normal(175, 6, 200) p_values[i] = p_value_calculator(sample_heights, 200, 175)

plt.hist(p_values, bins=30, edgecolor='black') plt.xlabel('p-value') plt.ylabel('Count') plt.title('Distribution of p-values')

Despite having a large sample size, I noticed that the distribution of p-values appears uniform, which was initially surprising to me. Does sample size affect the distribution of p-values or is it always uniform when the null hypothesis is true?"

Saiko
  • 13

3 Answers3

5

As others have said, p-values always have a uniform distribution when the null hypothesis is true. I wanted to address your intuition that since sample means will be closer to the population mean because of the larger sample size, you would expect p-values to be closer to 1 and therefore unimodal. Although it's true that sample means will be closer to the population mean, remember that it's the z-statistic, which is a function of the sample mean and the standard error (i.e., $\frac{\bar{x} - \mu}{\sigma/\sqrt{n}}$), that determines the p-value. Although the sample means will be closer to the population mean with a larger sample, the standard error will shrink accordingly so that the z-statistic maintains a normal distribution and the p-value maintains a uniform distribution.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Noah
  • 33,180
  • 3
  • 47
  • 105
4

When the null hypothesis is true, which it is in your code, p-values should be distributed as $U(0,1)$. This way, p-values mean what they claim to mean, in that the probability of getting a p-value of $0.05$ or lower is actually $0.05$.

Dave
  • 62,186
4

If your data follows the null hypothesis, the p value is uniformly distributed on the unit interval, by the definition of the p value. Thus, the sample size has no impact on the distribution.

If your data does not follow the null hypothesis, the p value follows a different distribution, which will typically become more and more extreme as you collect more data, so you indeed have a dependence on the sample size in this case.

Stephan Kolassa
  • 123,354