I am trying to test if the sampled interval between random events fits a particular geometric distribution, and am pretty lost as to what I'm doing wrong.
Assuming there's nothing wrong with the library,
# Take n samples from the geometric distribution
data = scipy.stats.geom.rvs(.2, size=n, random_state=None)
Perform ks test against the cdf of distribution
print(scipy.stats.kstest(data, lambda x: scipy.stats.geom.cdf( x, .2)))
results in p-values near 0 (1e-5) with sample size of 100. Increasing the sample size causes the p-values to decrease further (eg 1000 samples results in 1e-35), which is the opposite of what I expect.
Am I making some incorrect statistical assumptions? Is something wrong with my methodology? Is goodness of fit testing not what I'm looking for? Are there other statistical tests that I can do instead?
ks.testto match the ECDF of a sample from a discrete distribution against the CDF of from that distribution often yields warning msg that the test does not give an accurate P-value in the presence of ties. A sample from a geom dist'n with small $p$ will typically have massive amounts of ties: my geom sample of size 1000 with $p=0.1$ had only about 80 uniquely different values. Gave P-val v. near 0. (Visually the ECDF plot matched the CDF just fine.) – BruceET Nov 11 '21 at 16:56length(unique(rgeom(1000, .1)))returned $51.$ Andks.test(rgeom(1000, .1), pgeom, .1)returns P-val $4.122e-09$ along withWarning message: ... ties should not be present for the Kolmogorov-Smirnov test– BruceET Nov 11 '21 at 17:13