1

I am trying to develop a way to compare items. Each item has the same two properties associated with it, say, $p_{1}$ and $p_{2}$.

All I am given are these two measures for each item, each of which may be given as a range, say $x_{1} \le p_{1} \le x_{2}$ and $y_{1} \le p_{2} \le y_{2}$.

Do you think that something like a Euclidean distance for two items under comparison would make sense?

For two items, say, $a$ and $b$, that metric might be

$\sqrt((x_{1_a}-x_{1_b})^2 + (x_{2_a}-x_{2_b})^2 + (y_{1_a}-y_{1_b})^2 + (y_{2_a}-y_{2_b})^2)$

Thank you.

  • Please tell us something that would help choose from the very many metrics one might consider. For instance, what metric would you use if you had perfectly accurate point values for the properties? Also, how are these ranges determined or measured? That might matter, too. – whuber Apr 21 '22 at 13:05
  • Thank you for the question. The problem is to identify a measure to identify similar chemical mixtures. Each mixture has a given range of two properties (e.g. density and molecular weight), but no distributions or informative statistics about these properties. I don't know if that helps or just adds to the confusion. – user86993 Apr 21 '22 at 14:39

2 Answers2

1

As usual, it depends on the type of properties that you are working. But in general, Euclidean distance is a great measure for distance between observations.

0

IIUC, you have for each given item $i$ a rectangle $r(i) = [x_{min}(i), x_{max}(i)]\times[y_{min}(i), y_{max}(i)]$ in the $(x,y)$-plane where the properties $(p_1(i), p_2(i))$ of this item $i$ could be found. So one could think of this rectangle as the support of a uniform rectangular probability density function (pdf). Then, finding the distance between two items could be translated into finding the distance between those two pdfs.

There are many notions of distance between pdfs, see e.g. here. One particular intuitive one would be the Wasserstein metric.

frank
  • 10,797
  • Thank you for the ideas. I had recently read an article about the earth mover's distance, but hadn't thought about applying it here using a rectangular pdf. – user86993 Apr 21 '22 at 14:35
  • 1
    Please see https://stats.meta.stackexchange.com/questions/6304/my-upvoting-policy, when you find a question sufficiently clear to write an answer, consider to upvote the question! – kjetil b halvorsen Apr 21 '22 at 15:00
  • @kjetilbhalvorsen That is usually the way I do it, but unfortunately I keep forgetting. Thanks for the reminder! – frank Apr 21 '22 at 15:35