1

Show by an example that the correlation-based distance

$d(X,X^\prime)=1-\rho(X,X^\prime)=1-\frac{\sum_{j=1}^p (X_j-\bar{X})((X_j^\prime-\bar{X}^\prime)}{\sqrt(\sum_{j=1}^p (X_j-\bar{X})^2\sum_{j=1}^p (X_j^\prime-\bar{X}^\prime)^2)}$ where $X, X^\prime ∈ R^p$,

is not strictly speaking a metric.

My understanding: We need to show that distance $d$ violates the triangle inequality, but how can I show this using example. I am kinda confused! Thank you so much for your suggestions!

Quantam
  • 23
  • 1
    Since the triangle inequality concerns three points (random variables), it might help to understand the restrictions on correlation coefficients between multiple variables. Some relevant threads are https://stats.stackexchange.com/questions/72790, https://stats.stackexchange.com/questions/305441, https://stats.stackexchange.com/questions/445370, and https://stats.stackexchange.com/questions/256116. – whuber Apr 25 '21 at 14:16
  • @whuber What would be the appropriate example to show the violation of triangle inequality? I am kinda confused with the problem statement! – Quantam Apr 25 '21 at 14:19
  • Consider what the triangle inequality says and think about what combinations of correlation coefficients might make it false. It can help to consider extreme circumstances, such as when correlations are as positive or as negative as they can possibly be. It can also help to visualize correlations -- there are many ways to do so and likely you have encountered at least one. – whuber Apr 25 '21 at 14:22
  • There is a counter example for triangle inequality in this answer. – Adnan Baysal Aug 06 '21 at 19:32

1 Answers1

0

A metric $d$ is a function $M\times M\rightarrow\mathbb R_{\ge 0}$ that satisfies four conditions.

  1. For any $x\in M$, $d(x, x)=0$.

  2. For any unequal $x,y\in M$, $d(x, y)>0$.

  3. For any $x,y\in M$, $d(x, y)=d(y,x)$.

  4. For any $x,y,z\in M$, $d(x, y) + d(y,z) \ge d(x,z)$. This is the triangle inequality, and it means that it is no shorter to leave work, go to the store, and then go home than it is to go straight home from work.

Let $M=\mathbb R^p$, and define a function $d: M\times M\rightarrow \mathbb R_{\ge 0}$ by $d(X, Y) = 1 - \rho(X, Y)$ for any two elements $X, Y\in M$.

  1. $\rho(X, X) = 1$, so $1-\rho(X,X)=0$. We are good here.

  2. Let $p=2$, let $X=\left(1, 2\right)$, and let $Y=\left(2, 4\right)=2X$. Thus, $X$ and $Y$ are distinct. Then $d(X, Y) = 1-\rho(X, Y) = 1-\rho(X,2X)$. However, because of the construction of $Y$ as $2X$, $\rho(X, Y)=\rho(X, 2X) = \rho(X, X)$. Consequently, $d(X, Y) = 1-\rho(X, Y) = 1-\rho(X,2X) = 1-1=0$, yet the inputs to the metric function are distinct. Overall, $d$ cannot be a metric. Further, the square root of this $d$ will have this same issue of giving an output of zero despite getting distinct inputs, so $\sqrt{1 - \rho(X, Y)}$ is not a metric, either.

Dave
  • 62,186
  • sqrt(1-r) is a euclidean distance, therefore it is metric. – ttnphns Feb 09 '23 at 17:04
  • @ttnphns Where does my example do awry? It looks like I can input distinct values yet get an output of zero, which cannot happen for a metric. – Dave Feb 09 '23 at 17:05
  • Well, sorry, I meant sqrt(1-r[x,y]) with standardized vectors x and y. Their norms are unit. But you are meaning the distance between the raw length vectors. – ttnphns Feb 09 '23 at 18:06
  • @ttnphns Then isn't the metric space restricted to $\big{x=\left(x_1,\dots,x_p\right)\in\mathbb R^p\bigg\vert\sum_i x_i^2 = 1\big}$, rather than all of $\mathbb R^p?$ – Dave Feb 09 '23 at 18:09
  • The space restrics to the unit p-sphere. Is that what you mean by your notation? – ttnphns Feb 09 '23 at 18:14
  • @ttnphns Isn't that the same as what you wrote about working with standardized vectors? – Dave Feb 09 '23 at 18:26