2

I want to check if the numbers produced by my random number generator are uniformly distributed. My code is below - is the statistical approach correct?

Disclaimers - this isn't homework, it's personal interest. And for the 'what research have you done' brigade, I have done a fair bit of rummaging on stats.stackexchange, and while there's plenty of similar questions I cannot find any that answer the above question specifically.

c(126L, 106L, 182L, 162L, 232L, 130L, 113L, 75L, 191L, 19L, 242L, 245L, 152L, 241L, 240L, 140L, 208L, 163L, 174L, 194L, 69L, 194L, 200L, 7L, 213L, 145L, 170L, 198L, 50L, 207L, 247L, 116L, 162L, 173L, 168L, 232L, 23L, 156L, 62L, 165L, 19L, 206L, 250L, 150L, 170L, 29L, 62L, 26L, 209L, 120L, 131L, 170L, 197L, 1L, 153L, 195L, 250L, 29L, 151L, 912L, 139L, 23L, 211L, 237L, 10L, 248L, 119L, 138L, 118L, 190L, 207L, 136L, 13L, 55L, 117L, 239L, 90L, 18L, 196L, 120L, 140L, 170L, 189L, 44L, 250L, 131L, 241L, 150L, 86L, 146L, 84L, 138L, 56L, 125L, 199L, 188L, 225L, 67L, 63L, 914L, 77L, 190L, 121L, 206L, 150L, 226L, 50L, 77L, 76L, 36L, 12L, 126L, 169L, 168L, 160L, 199L, 173L, 68L, 192L, 21L, 246L, 250L, 88L, 244L, 196L, 244L, 46L, 142L, 133L, 224L, 73L, 78L, 38L, 222L, 174L, 39L, 49L) -> numbers_from_random_app

table(numbers_from_random_app) -> distribution

stats::chisq.test(distribution)

luciano
  • 14,269
  • This is indeed a good question that can be entertained by a well-researched answer. But merely checking whether the code is right or not seems to be on the line of off-topic (this is my opinion). Meanwhile, check these posts Prove a random generated number is uniform distributed, Verifying that a random generator outputs a uniform distribution. – User1865345 Feb 02 '23 at 15:28
  • 3
    The chi-squared test is not really designed for that table: you have $137$ values from $1$ up to $914$ so most numbers do not appear at all and the majority of the numbers which do appear appear $1$ time. Since your method ignores those which do not appear, it will not spot that there are no values from $251$ though to $911$, which suggests to me that the values were not selected from a uniform distribution on an interval – Henry Feb 02 '23 at 15:28
  • 1
    @User1865345 it isn't a question about programming. The code is correct in terms of running without errors. The question is about whether to or the statistics applied is correct - I have edited question. – luciano Feb 02 '23 at 17:13
  • I have voted to reopen @luciano. – User1865345 Feb 02 '23 at 17:17
  • 2
    "The code is correct in terms of running without errors": that's not a very reliable way to check the correctness of a piece of code. – dipetkov Feb 02 '23 at 17:20
  • You might look at the output from table(numbers_from_random_app) and see if that's the object you want to be testing. – Sal Mangiafico Feb 02 '23 at 18:09
  • Or maybe another way to see that this isn't doing what you want, start with a log-normal distribution of values, and follow the same process. E.g.: A = round(rlnorm(100, 0, 1), 2); hist(A); Table = table(A); Table; chisq.test(Table) – Sal Mangiafico Feb 02 '23 at 19:20

1 Answers1

2

Consider the Kolmogorov-Smirnov (nonparametric) test that compares the cumulative distributions of two data sets.

  • Null hypothesis: the two dataset are from the same continuous distribution
  • If the p value is small, conclude that the two groups were sampled from populations with different distributions.

K-S test with similar distributions. Note that ecdf computes the cumulative distribution function (CDF)

data1 <- rnorm(100)
data2 <- rnorm(100)
ks.test(data1, data2)
plot ( ecdf ( data1 ) ); lines ( ecdf ( data2 ), col='red' )

K-S test with two different distributions

data1 <- rnorm(100)
data2 <- runif(100)
ks.test(data1, data2)
plot ( ecdf ( data1 ) ); lines ( ecdf ( data2 ), col='red' )

K-S test against a uniform distribution with parameters 0 and 1

data2 <- runif(100)
ks.test(data2,"punif",0,1)

In your case, the last example meets your needs as long as you know the parameters of your uniform distribution.

YLC
  • 21