1

I'm using the Scipy implementation of the Kolmogorov–Smirnov test to check whether collections of random values are likely to have been drawn from a uniform distribution.

From what I understand, the second value of the returned Tuple from kstest is a p-value indicating the likelihood that the data was generated according to the null hypothesis. Based on repeated tests of a 100,000 observations drawn from a uniform distribution on a scale of 1 - 10 the p-values appear to be uniformly distributed between 0 and 1. I was expecting these values to be fairly consistently < 0.05.

Here is the code I've used to recreate the issue

low = 1.0
high = 10.0
count = 1000000
for i in range(10):
    u_dist = scipy.stats.uniform(loc=low, scale=high-low)
    uniforms_rvs = u_dist.rvs(size=count)
    statistic, p_value = stats.kstest(uniforms_rvs, u_dist.cdf)
    print(f"Statistic: {statistic}, P-Value: {p_value}")

Output

Statistic: 0.0009491697831712775, P-Value: 0.32830089131050355
Statistic: 0.0008462255359052984, P-Value: 0.47081659515626595
Statistic: 0.0005102093878102121, P-Value: 0.9569178628542782
Statistic: 0.0007726509699242379, P-Value: 0.58893117128815
Statistic: 0.0013884156783960933, P-Value: 0.04229083546591694
Statistic: 0.0007256760537448503, P-Value: 0.6678951741671402
Statistic: 0.0006701156877765291, P-Value: 0.7599638407223168
Statistic: 0.0012097066832155168, P-Value: 0.10703563365795754
Statistic: 0.0016080431661646966, P-Value: 0.011338714805479664
Statistic: 0.0008064625528795277, P-Value: 0.5333953515406695

Am I missing something?

  • 2
    Your null hypothesis is true, isn’t it? – Dave Apr 11 '22 at 11:39
  • Ah, yes, I'd completely misinterpreted the p-value. The null hypothesis is that they were drawn from the same distribution. Thank you – Jack O'Neill Apr 11 '22 at 11:50
  • You may be further interested in: https://stats.stackexchange.com/q/10613. – Alexandru Dinu Apr 11 '22 at 12:00
  • ... or possibly even https://stats.stackexchange.com/questions/31, concerning interpreting p-values. – whuber Apr 11 '22 at 12:08
  • I understand the concept of p-values generally. The confusion here stemmed from the fact that the p-value represents the probability that the values were not chosen from the given distribution, rather than the probability that they were. – Jack O'Neill Apr 12 '22 at 13:09

0 Answers0