0

I generated distributions of travel times of commuters using transportation simulation tools (for different scenarios). The distributions are attached below. I wish to statistically compare each pair of these non-parametric distributions.

enter image description here Null Hypothesis - distribution belong to same population and they are different only by chance (randomness).

Alt hypothesis - distribution do not belong to same population i.e. the factors varied in each simulation affected the outcome distribution

Q1. Which test should I use? There are some tests which compare medians but these distribution can have multiple peaks and therefore, similar median does not mean they belong to same population.

Q2. I am currently using Kolmogorov-Smirnoff test which looks for maximum gap between the distributions. Can I use chi-square test?

AdamO
  • 62,637
SiH
  • 141
  • 1
  • What does "nonparametric distributions" mean to you? // 2) What about the KS test do you not like? // 3) What would you like about a chi-squared test?
  • – Dave Oct 28 '21 at 15:38
  • Did you mean to include histograms or smoothed density estimates of your simulated distributions? How many distributions have you generated and are testing? For instance, if you have 4 distributions, are you calculating 6 pairwise comparisons or do you want 1 global test? – AdamO Oct 28 '21 at 15:46
  • If there is no family of distributions specified, you can't use hypothesis testing, because there are an infinite number of hypotheses to be tested. Hypothesis testing works on specific parameters of interest, and you can look at those. – Paul Oct 28 '21 at 15:56
  • @Dave I am not an expert in statistics. But I will try to answer based on limit knowledge. 1. Non parametric distribution - no assumption about the shape of the underlying distribution is made. The distribution are generated from simulations. – SiH Oct 28 '21 at 15:58
  • @Dave 2. I feel ks-test id very sensitive to the peaks. For example, in one distribution there is a high peak at x = 2 while in another distribution there is a peak at x = 2.05 (the bin size is 0.05). The remaining shapes are very identical. The ks test would look at the cumulative distribution and because of the peaks they would be statistically different. However, chi-square test would compare the distribution at every bin. Therefore, I feel chi-square is better test here. – SiH Oct 28 '21 at 15:58
  • @AdamO There are 54 distributions. I wish to compare each of 1431 pairs. – SiH Oct 28 '21 at 15:59
  • Making $1,431$ comparisons brings up controls for multiple testing, the easy of which (e.g., Bonferroni) will sap away your power to reject. // 2) Why would you not want to catch that $2$ vs $2.05$ difference?
  • – Dave Oct 28 '21 at 16:01
  • @Paul The shapes are arbitrary and I can assume any distribution. I was thinking mean square of difference can be calculated between distributions to show they are different from each other. I though of using something like chi-squared – SiH Oct 28 '21 at 16:02
  • @Dave, lets say the x-axis is the travel time (in hours). The two simulations give same similar distributions. However, one shows a peak at 2 hour and the other at 2.05 hours. I feel on average they are same. A test which captures the mean of the squares of the difference would be ideal (something like chi-sq) – SiH Oct 28 '21 at 16:06
  • But the evidence says that they are not the same. Do you just mean that they are "close enough"? // 2) Mean square difference of what? You need to have some kind of pairing of points for differences to make sense. // 3) I think you're making a common mistake and using hypothesis testing inappropriately. Hypothesis testing is extremely literal. If you have a null of equality and there is evidence that the distributions are a little bit different, the hypothesis test should catch that and has made a type II error if it does not. Hypothesis testing does not tell you about "close enough".
  • – Dave Oct 28 '21 at 16:09