Two-sample Kolmogorov Smirnov test for global sensitivity analysis - How to treat discrete distributions?

Question

Dear Cross Validated community,

We are working on a uncertainty & sensitivity analysis using a mathematical optimization model. More specifically, we have a set of uncertain parameters, which follow specific probability distributions and we sample them to perform Monte Carlo simulations with our model.

For the sensitivity analysis, our method of choice is called Monte Carlo filtering. Essentially, first, we divide the model outputs from the Monte Carlo simulation in two subsets ('good' or 'bad') based on a given criterion (e.g. cost < 100). Then, we map this division into the input sample space to obtain 'good' and 'bad' input samples for each parameter.

As the key part of the method then, we perform a two-sample Kolmogorov-Smirnov tests for each input parameter separately. The two samples used for the test are the 'good' and 'bad' input sample subsets from the previous division.

As a metric for parameter importance then, if the result says that the two subsets are from different distributions, then this parameter is deemed as important, as there is higher probability that high or low values of the parameter will lead to 'good' or 'bad' outcome. If not, then the parameter is unimportant, as regardless of its value, it can lead to either 'good' or 'bad' model outputs.

This method is proven, tested, and used extensively in the model-based studies. However, in most cases, the input parameters used in the Monte Carlo simulations and the subsequent K-S tests are sampled from continuous distributions.

My question then is: If we have one input parameter that is sampled from a discrete uniform distribution e.g. with a range of [1,10], does our method still work or does it break as the two-sample KS-test only works with continuous distributions (as I have also read in numerous threads here, but with contradicting info on the Mathematics stackexchange forum here)?

From my perspective, on the one hand, the test would still give us the distance between the two empirical CDFs, even if the distributions are discrete and we can use this information to compare to the condition value ($c(\alpha) \cdot \sqrt{\frac{n + m}{n \cdot m}}$).

Alternatively, we could sample this parameter as a continuous uniform distribution $U[1,10]$ to adhere to the KS-test requirements and simply round to the nearest integer in the model before we run our simulations.

Any feedback on this? Any help will be greatly appreciated!

The linked math.SE post simply shows how the usual calculation is carried out on those data. I see nothing in it that says "Kolmogorov-Smirnov works as advertized on discrete distributions"; that specific question is simply not considered at all there. Just as you'd expect when the person answering has no idea that it's even an issue. — Glen_b, May 13 '23 at 01:36
An example showing the extent of the problem is illustrated here. What can be done about it is briefly discussed here: https://stats.stackexchange.com/questions/142238/comparing-frequency-distributions/142240#142240 or here: https://stats.stackexchange.com/questions/139612/can-i-use-likelihood-ratio-test-to-compare-two-samples-drawn-from-power-law-dist/139719#139719 ... many more such discussions are on site already. TLDR summary: "permutation testing works" ... ctd — Glen_b, May 13 '23 at 01:43
ctd ... if you need a reference, you could try this one: https://stats.stackexchange.com/questions/88764/test-for-difference-between-2-empirical-discrete-distributions/88791#88791 , which I have not read, but it sounds like it may offer a shortcut (not that permutation tests are particularly onerous as is) — Glen_b, May 13 '23 at 01:43

Two-sample Kolmogorov Smirnov test for global sensitivity analysis - How to treat discrete distributions?

0 Answers0