Dot Product and Distance Matrix

Question

If we want to calculate the squared distance between 2 vectors, $x$ and $y$, we use the dot product:

$$||x-y||^2 = (x-y)(x-y)^T = xx^T - 2xy + yy^T$$

The question is, how to generalize this concept to more than one vector, i.e. calculating the Euclidean Distance Matrix between two sets of vectors, $X$ and $Y$.

In this article, the author states the method of vectorizing the calculation of the Euclidean Distance Matrix:

  def compute_distances_no_loops(self, X):
        dists = -2 * np.dot(X, self.X_train.T) + np.sum(self.X_train**2, axis=1) 
                + np.sum(X**2, axis=1)[:, np.newaxis]
        return dists

But I don't see where this came from. How could the above formula be derived?

From here it can be extend into matrix notation. Let $X$ be a n X p dataset between rows of which you want to compute the n X n matrix of squared euclidean distances $D^2$. (1) $M= XX'$ (dot products between rows. (2) $h^2= diag(M)$. (3) Replicate this column as an outer product: $h^2=h^2o$, where $o$ is the row of $1$s of length n. (4) Last, $D^2= h^2+{h^2}'-2M$. — ttnphns, Mar 13 '19 at 13:44
@ttnphns So there is no way to generalize this formula from the dot product formula stated in the question? — ElegantLogic, Mar 13 '19 at 18:35
What do you mean saying "between two sets of vectors"? How do you define Euclidean distance between sets (bunches) of vectors? Can you show a picture of it? — ttnphns, Mar 13 '19 at 19:22
@ttnphns We have an (n x d) matrix $X$ where every row is a point in d-dimensional space. The Euclidean distance matrix $D$ is the (n x n) matrix, where the elements of the $i$th row are the squared distance between the $i$th point and every other point. I was originally asking this question in relation with the pairwise similarity matrix calculated in t-SNE algorithm, as it uses the Euclidean distance matrix in its implementation. In that implementation, the author used the vectorized form for calculating the distance matrix, and I wanted to know where it came from--how it could be derived. — ElegantLogic, Mar 14 '19 at 09:22
If you want to compute the matrix nXn between all n vectors pairwisely then my initial comment is the answer how to do it; it directly is the implementation of your right hand side formula. — ttnphns, Mar 14 '19 at 09:53
@ttnphns Yes, I realize that. But what I was looking for is a way to derive that formula from the definition of the dot product above. A way to generalize it. The verification is straightforward, but I wanted to know where it came from. — ElegantLogic, Mar 14 '19 at 12:54
But if follows from what the dot product matrix is https://stats.stackexchange.com/a/22520/3277. — ttnphns, Mar 14 '19 at 13:18
@ttnphns Yes I see. I just wish I could know the thought process that lead to that formula. But it is not clear from all I've seen so far, it seems as if it magically came by. — ElegantLogic, Mar 14 '19 at 16:43

Dot Product and Distance Matrix

0 Answers0