2

Let X and Y be probability vectors, meaning that X = $[x_1, x_2, ..., x_n]^T$, where $x_i\leq 1$ and $\sum_{i=1}^{n}x_i=1$ (Y is defined similarly).

Define the Jaccard distance as

\begin{equation} J_d = 1 - \frac{\textbf{X}\cdot\textbf{Y}}{\textbf{X}\cdot \textbf{X}+ \textbf{Y}\cdot\textbf{Y} - \textbf{X}\cdot\textbf{Y}} \end{equation}

Is $J_d$ a proper distance (i.e., metric)?

Yaz
  • 21

1 Answers1

2

No, it is not. For fixed $Y$, fixed $Z=(z_1,\ldots,z_n)$ with $\sum z_i=0$, and small $t$ we have $J_d(Y\pm tZ, Y)\sim t^2 \frac{Z\cdot Z}{Y\cdot Y}$, but $$J_d(Y-tZ,Y+tZ)\sim 4t^2\frac{Z\cdot Z}{Y\cdot Y}>J_d(Y-tZ,Y)+J_d(Y,Y+tZ).$$

Fedor Petrov
  • 102,548
  • I believe the vector Z should be constrained such that Y + t Z remains in the space of probability distribution vectors. If this constraint is applied to Z, does this triangular inequality remain violated? – Yaz May 12 '21 at 15:18
  • If all coordinates of $Y$ are positive and $\sum z_i=0$, it remains for small values if $t$. – Fedor Petrov May 12 '21 at 15:47
  • If you want a concrete example, take $Y=(0.5,0.5)$, $Z=(0.1,-0.1)$ and $t=1$ in Fedor's answer. You obtain $(0.5,0.5)$, $(0.4,0.6)$ and $(0.6,0.4)$ that indeed violate the triangle inequality. – Jukka Kohonen May 12 '21 at 18:59