0

K-means computes cluster centroids differently for each distance metric. I don't know why the way of computing the centroid is dependent of the distance measure.

I don't know how we compute the centroid for manhattan distance and its difference with the computing the centroid for euclidean distance?

curiosus
  • 303
  • This question is dubious because you don't say if you mean the distance between data poinrs or the distance between a point and a centroid. Please read a related post https://stats.stackexchange.com/q/81481/3277 – ttnphns Nov 14 '21 at 19:12

2 Answers2

3

K-means does not work with arbitrary distances, and was originally only formulated with squared errors.

It is for squared Euclidean distance and Bergman divergences.

K-means does not minimize Euclidean distances! It will run, and find an okayish (not obviously wrong) solution; but not even a local optimum. So the simple answer is: don't rely on k-means for other distances. It so not something you can easily fix.

  • You have an example here http://www.ece.northwestern.edu/local-apps/matlabhelp/toolbox/stats/kmeans.html. It says that kmeans computes centroid clusters differently for the different supported distance measures and it lists the way of computation centroid or center for each distance measure. – curiosus Oct 24 '18 at 08:32
  • 1
    Well, is not kmeans but a method known as kmedians if you use the component wise median. And that doesn't just work with some of the fast algorithms for kmeans. – Has QUIT--Anony-Mousse Oct 24 '18 at 20:39
  • 1
    For Euclidean distance, you "just" need to solve the Weber problem – Has QUIT--Anony-Mousse Oct 24 '18 at 20:40
  • Can you explain more in details the relation between weber problem and k-means and in what that answers the question about why the way of center computation is dependent to the distance measure? – curiosus Oct 24 '18 at 20:47
  • 1
    The Weber problem is the difficult problem of finding the most central point for Euclidean distance. That would be what you would need to use instead of the mean if you want to minimize Euclidean distances, essentially a k-weberpoint clustering. – Has QUIT--Anony-Mousse Oct 25 '18 at 01:18
0

@curiosus you are right, according to the definition of Kmeans for the Euclidean distance, the centroid whose coordinates are the average of the coordinates of the points of a cluster are simply a local minimum of the inertia, that is the value for which the derivatives of inertia with respect to the centroid are zero. The global minimum is another story.