1

I have two distinct probability density functions, and I would like to find a synthetic measure of how different the two distributions are. Intuitively, it would make sense to me to compute the area between the two curves, as in the example shown below

Source: How to visualise the difference between probability distribution functions?

The metric would range between 0 (perfect overlapping) and 2 (no overlapping), and would be computed as the integral of the absolute value of the difference between the two PDFs. However, after searching the internet for a similar metric, I wasn't able to find any. What is the downside of such an apparently simple metric? Am I getting something wrong?

Edit: Image is for visualization purposes only. It is not mine and I am not trying to compare two samples (rather, two known PDFs).

A reference to a similar metric is made in this paper, see equation 2, but other than this I can't find anything close to it. Something close to what I mean, albeit for discrete distributions, is the dissimilarity index.

Does my proposed metric not make sense? I am asking this question because it seems one of the most intuitive ways to gauge the difference between two PDFs, yet I can find virtually no reference to it on the web.

  • 2
    did you look at Kolmogorov-smirnov metric? it's the max distance between CDFs – Aksakal Jun 30 '22 at 14:36
  • @Aksakal Yes, I know about the KS test but it's not really what I was looking for. I have found a paper that mentions something similar to my idea (https://lilia.dpss.psy.unipd.it/~massimiliano.pastore/papers/Pastore&Calcagni_2019.pdf). It links the non-overlapping area between two PDFs to the KL divergence, but I can't quite see how. – user362018 Jun 30 '22 at 14:42
  • Easier to deal with cdf – Aksakal Jun 30 '22 at 14:43
  • 1
    Since, in a sense, the CDF is the area of a PDF, and since the Kolmogorov-Smirnov statistic $D$ measures the difference of greatest magnitude between two CDFs, it is difficult for me to understand why the Kolmogorov-Smirnov statistic does not meet your needs, as @Aksakal mentioned. – Alexis Jun 30 '22 at 14:51
  • 3
    Because you complain that a well-known, general, theoretically understood metric is "not ... what I was looking for," you ought to edit your post to indicate more clearly what it is you are looking for. It is puzzling, too, that your graphics indicate you are working with samples. How, then, do you obtain densities? This detail might matter. If your objective is to compare samples, then why run your analysis through this intermediate step of estimating a density when you could compare the samples directly? – whuber Jun 30 '22 at 15:26
  • 1
    @whuber observation on samples makes cdf even more convenient: empirical cdf is much more accurate than empirical pdf – Aksakal Jun 30 '22 at 15:41
  • @whuber thank you for your comment. The image I attached is only meant to offer a visualization of the quantity I am interested in, i.e. the area between the two PDFs. My post asked whether a metric of that kind exists / is useful in order to compare two known PDFs. I am asking this question because it seems one of the most intuitive distance measures, yet I couldn't find any reference to methods of this kind on the web (apart from a brief reference here: https://lilia.dpss.psy.unipd.it/~massimiliano.pastore/papers/Pastore&Calcagni_2019.pdf). Any ideas why? Does that metric not make sense? – user362018 Jun 30 '22 at 17:44
  • It's not very general, because many common, useful distributions do not have densities. Regardless, there are many metrics. See https://stats.stackexchange.com/search?q=pdf+overlap for instance. But any $L^p$ metric will work, too, as well as any metric that compares the distributions themselves (KS, Wasserman, KD, etc). The real issue comes down to choosing among such a huge variety of metrics to meet your statistical needs, whatever they might be. – whuber Jun 30 '22 at 18:08
  • In 1D, the integral of the absolute value of the differences in two CDFs gives you the 1-Wasserstein distance: https://en.wikipedia.org/wiki/Wasserstein_metric#One-dimensional_distributions – Hypercube Jun 01 '23 at 20:29
  • See https://stats.stackexchange.com/questions/271582/is-the-intersection-area-between-2-pdfs-a-probability/271590#271590 – kjetil b halvorsen Jan 11 '24 at 01:12

0 Answers0