If we want to calculate the squared distance between 2 vectors, $x$ and $y$, we use the dot product:
$$||x-y||^2 = (x-y)(x-y)^T = xx^T - 2xy + yy^T$$
The question is, how to generalize this concept to more than one vector, i.e. calculating the Euclidean Distance Matrix between two sets of vectors, $X$ and $Y$.
In this article, the author states the method of vectorizing the calculation of the Euclidean Distance Matrix:
def compute_distances_no_loops(self, X):
dists = -2 * np.dot(X, self.X_train.T) + np.sum(self.X_train**2, axis=1)
+ np.sum(X**2, axis=1)[:, np.newaxis]
return dists
But I don't see where this came from. How could the above formula be derived?
n X pdataset between rows of which you want to compute then X nmatrix of squared euclidean distances $D^2$. (1) $M= XX'$ (dot products between rows. (2) $h^2= diag(M)$. (3) Replicate this column as an outer product: $h^2=h^2o$, where $o$ is the row of $1$s of lengthn. (4) Last, $D^2= h^2+{h^2}'-2M$. – ttnphns Mar 13 '19 at 13:44