Understanding Kolmogorov-Smirnov test in R

Question

I'm trying to understand the output of the Kolmogorov-Smirnov test function (two samples, two sided). Here is a simple test.

x <- c(1,2,2,3,3,3,3,4,5,6)
y <- c(2,3,4,5,5,6,6,6,6,7)
z <- c(12,13,14,15,15,16,16,16,16,17)

ks.test(x,y)

#   Two-sample Kolmogorov-Smirnov test
#
#data:  x and y
#D = 0.5, p-value = 0.1641
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, y) : cannot compute exact p-value with ties

ks.test(x,z)

#Two-sample Kolmogorov-Smirnov test

#data:  x and z
#D = 1, p-value = 9.08e-05
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, z) : cannot compute exact p-value with ties


ks.test(x,x)

#Two-sample Kolmogorov-Smirnov test

#data:  x and x
#D = 0, p-value = 1
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, x) : cannot compute exact p-value with ties

There are a few things I don't understand here.

From the help, it seems that the p-value refers to the hypothesis var1=var2. However, here that would mean that the test says (p<0.05):

a. Cannot say that X = Y;

b. Can say that X = Z;

c. Cannot say that X = X (!)

Besides appearing that x is different from itself (!), it is also quite strange to me that x=z, as the two distributions have zero overlapping support. How is that possible?

According to the definition of the test, D should be the maximum difference between the two probability distributions, but for instance in the case (x,y) it should be D = Max|P(x)-P(y)| = 4 (in the case when P(x), P(y) aren't normalized) or D=0.3 (if they are normalized). Why D is different from that?
I have intentionally made an example with many ties, as the data I'm working with have lots of identical values. Why does this confuse the test? I thought it calculated a probability distribution that should not be affected by repeated values. Any idea?

I think your conclusions are reversed, based on the p-values, you can say X=X and cannot say X=Z — user3661376, May 17 '20 at 15:01

DWin · Answer 1 · 2017-06-16T17:34:24.500

The KS test is premised on testing the "sameness" of two independent samples from a continuous distribution (as the help page states). If that is the case then the probability of ties should be astonishingly small (also stated). The test statistic is the maximum distance between the ECDF's of the two samples. The p-value is the probability of seeing a test statistic as high or higher than the one observed if the two samples were drawn from the same distribution. (It is not the "probability that var1 = var2". And furthermore, 1-p_value is NOT the that probability either.) High p-values say you cannot claim statistical support for a difference, but low p-values are not evidence of sameness. Low p-values can occur with low sample sizes (as your example provides) or the presence of interesting but small differences, e.g. superimposed oscillatory disturbances. If you are working with situations with large numbers of ties it suggests you may need to use a test that more closely fits your data situation.

My explanation of why ties were a violation of assumptions was not a claim that ties invalidated the results. The statistical properties of the KS test in practice are relatively resistant or robust to failure of that assumption. The main problem with the KS test as I see is that it is excessively general and as a consequence is under-powered to identify meaningful differences of an interesting nature. The KS test is a very general test and has rather low power for more specific hypotheses.

On the other hand, I also see the KS-test (or the "even more powerful" Anderson Darling or Lillefors(sp?) test) used to test "normality" in situations where such a test is completely unwarranted, such as test for the normality of variables being used as predictors in a regression model before the fit. One might legitimately want to be testing the normality of the residuals since that is what is assumed in the modeling theory. Even then modest departures from normality of the residuals do not generally challenge the validity of the results. Persons would be better of using robust methods to check for important impact of "non-normality" on conclusions about statistical significance.

Perhaps you should consult with a local statistician? It might assist you in defining the statistical question a bit more precisely and therefore have a better chance of identifying a difference if one actually exists. That would be avoidance of a "type II error": failing to support a conclusion of difference when such a difference is present.

Very nice. This may be enlightening: Kolmogorov-Smirnov with discrete data: What is proper use of dgof::ks.test in R? — Stephan Kolassa, Jul 05 '16 at 19:17
I have tested the same examples both with dgof::ks.test(x,y,simulate.p.value=TRUE, B=1000) and Matching::ks.boot(x,y, nboots=1000) (http://sekhon.berkeley.edu/matching/ks.boot.html). Both D and the calculated p-value are absolutely identical in both cases. This makes me think that perhaps KS is not so bad, even when one has several ties and the method is not guaranteed to work? The reason why I like KS is that is not parametric, i.e. I don't need to assume a distribution for the samples. — Nonancourt, Jul 06 '16 at 11:46
However, I still cannot make sense of the values of D. I thought it might be a prefactor as sqrt(mn/(m+n)) as here, but that would make `D(x,y) = sqrt(100/20)0.3=0.67`, which is still different. — Nonancourt, Jul 06 '16 at 11:53
So the two samples have to be independent, but can the observations within the sample be non-independent (related one another)? — Arthur_Morgan, Sep 04 '21 at 13:36
The selection of one item into the sample should not affect that chances or an other item getting selected. There's a phrase "independent and identically distributed" whose abbreviation is i.i.d. that generally gets spoken as a magic incantation at the beginning of any statistic proof of a simple test like the K-S test. Failure to be i.i.d. is generally a threat to validity of application of a test. So you would need to describe in a more precise manner how they were non-independent before any final verdict were passed, but there would be a presumption of guilt in the meantime. — DWin, Sep 04 '21 at 17:40

score 8 · Answer 2 · answered Jul 07 '16 at 00:52

To compute the D (from ks.test code):

ks.test(x,y)

    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.5, p-value = 0.1641
alternative hypothesis: two-sided

alternative <- "two.sided"
x <- x[!is.na(x)]
n <- length(x)
  y <- y[!is.na(y)]
  n.x <- as.double(n)
  n.y <- length(y)
  w <- c(x, y)
  z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))
  z <- z[c(which(diff(sort(w)) != 0), n.x + n.y)] #exclude ties
  STATISTIC <- switch(alternative, two.sided = max(abs(z)), 
                      greater = max(z), less = -min(z))
  STATISTIC

[1] 0.5

Understanding Kolmogorov-Smirnov test in R

2 Answers2