1

For two random variables $X_i$ and $X_j$, I define: $d(X_i,X_j)$ = $\sqrt{2(1-\rho_{i,j})}$ where $\rho_{i,j}$ is the Pearson correlation coefficient. I want to show that d is a pseudometric.

The four axioms can be found here: https://en.wikipedia.org/wiki/Metric_(mathematics)

The axioms of nonnegativity and of symmetry are trivial. If I am not mistaken, the identity of indiscernibles does not hold because if $X_i = a X_j$, $ \ d(X_i, X_j) = 0$ but $X_i \ne X_j$, that is why I want to show that it is "only" a pseudometric.

My main question is about the triangle inequality. How can I check it holds? If $cor(X,Y) = a$, $cor(Y,Z) = b$ and $cor(X,Z) = c$, by positive semidefiniteness of the correlation matrix you can say that $b \in [ac-\sqrt{(1-a^2)(1-c^2)}, \ ac+\sqrt{(1-a^2)(1-c^2)}]$ (the determinant is positive).

I want to show that $\sqrt{2(1-a)}+\sqrt{2(1-b)} \geq \sqrt{2(1-c)}$

In the "worst" case, $b = ac+\sqrt{(1-a^2)(1-c^2)}$

I tried to study the functions: a $\mapsto \sqrt{2(1-c)} - \sqrt{2(1-a)} - \sqrt{2(1-b)}$ with b replaced by its upper bound - and the same function of c - to show that they are always negative on $[-1,1]^2$ but it soon gets intractable. I could only check the result numerically. I imagine there are much better ways to solve the problem. Do you know any of those?

PS: I have found the thread Is triangle inequality fulfilled for these correlation-based distances? and the answer of ttnphns but I could not find a proof of the fact that if the correlation matrix is PSD then $d$ is Euclidean. Isn't it at odd with the identity of indiscernibles?

Petreius
  • 31
  • 3
  • a proof of the fact that...d is Euclidean. But did you see there the link to this? The distance is euclidean because Pearson correlation is the cosine between centered vectors (variables), its formula is the formula of cosine similarity. – ttnphns Sep 01 '16 at 00:08
  • Yes I have seen the link. But is the fact that Pearson correlation obeys the law of cosines (because the covariance is indeed a scalar product) enough to claim that the distance is Euclidean? Also, a Euclidean distance is a metric but it seems to me that the distance I consider does not verify the axiom "d(x,y) = 0 iff x = y". If X is proportional to Y for instance, the distance is zero because X and Y are perfectly correlated. – Petreius Sep 01 '16 at 00:30
  • What you say is correct, but I wonder why you still need a proof. 1) By the cosine theorem formula, the distance which is tied with the scalar product is euclidean (if vectors are considered in euclidean space). 2) r is cosine of the scalar product when the two variables are centered; also, if both variables are standardized to unit variance, then the cosine theorem becomes $d=\sqrt{2(1-r)}$. That distance is still euclidean because centering/scaling don't make the space spanned by the vectors non-euclidean. So this d is simply the euclidean d between the transformed, standardized variables. – ttnphns Sep 01 '16 at 08:07
  • ok, I would perfectly agree with you if I could find the result: " By the cosine theorem formula, the distance which is tied with the scalar product is Euclidean". I know that if the distance is Euclidean, then the cosine law holds but I'm not sure of the converse. – Petreius Sep 01 '16 at 08:38
  • Both euclidean distance and scalar product are properties of euclidean space. The law of cosine establishes their relation. If you show that $\sum XY$ is a scalar product in euclidean space and you believe in the law of cosines then automatically follows that the d is euclidean. – ttnphns Sep 01 '16 at 08:56
  • I understand ! So in the end, proving that this distance is a Euclidean distance boils down to prove that covariance is a scalar product because we build d in such a way that the distance and the scalar product are tied by the cosine theorem. Also I think I solved my second problem with the axiom: up to a rescaling + centering, two perfectly correlated variables are identical. – Petreius Sep 01 '16 at 09:03
  • I agree with you. – ttnphns Sep 01 '16 at 09:07

0 Answers0