Ok, I do some calculations between binned histograms via the use of Wasserstein metric and more specifically this python library.. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html It seems to work, but the numbers I get is whatever. How can I transform the numbers of comparing 2 histograms in percentage, meaning: from 0% (not similarity) to 100% (full similarity identical) and the between percentage meaning something????
Asked
Active
Viewed 19 times
0
-
1If this is a follow-up post to your earlier question, then please include this piece of information in your post. Such context can be important. As here: if your histograms all use the same bins, you can normalize them to densities instead of counts - and then, you don't need the distance to be normalized. So: do you really need a distance that is normalized to $[0,1]$? If so, why? Note that this is likely not easy and very context-dependent. – Stephan Kolassa Jul 31 '22 at 15:06
-
The idea is in case I try to compare all the histograms and just gives me a answer "Yes" the histograms are the same, "No" the histograms are not same. Without telling me which is the problematic/outlier histogram. So, for the follow up question I throw what the wasserstein distance gives me of the 2 histograms directly on DBSCAN algorithm, right? – just_learning Jul 31 '22 at 15:29
-
1Please read my answer to your earlier question again: I recommended that you first calculate distances (which do not need to be normalized), then feed these distances into the DBSCAN clustering algorithm. Of course the raw distances between your histograms won't tell you anything, because you have no frame of reference. – Stephan Kolassa Jul 31 '22 at 15:33