k-means triangle inequality

Question

This page explains very good how k-means works: http://mnemstudio.org/clustering-k-means-example-1.htm

Somewhere I heard that there is some algorithms, which use triangle inequality to speed up the process. In which step, and how exactly can use triangle inequality?

Charles Elkan. Using the Triangle Inequality to Accelerate k-Means. Note please that (at least I think so) it is rather a new clustering algorithm and method, which only resembles k-means, and not a modification of the k-means. — ttnphns, Jan 09 '15 at 07:47
Elkan's method is a modification of Lloyd's algorithm (the most popular way of solving k-means), and produces identical results. The acceleration simply comes from avoiding unnecessary distance computations. These computations are unnecessary because they couldn't have changed the results. — user20160, Jun 25 '16 at 17:53

Xiaowei Song · Accepted Answer · 2015-01-10T04:01:02.023

Using triangle inequality to find the bounds to avoid redundant distance calculations.

Let x be a point and let b and c be centers.

Lemma 1: If d(b,c)>=2d(x,b), then d(x,c)>=d(x,b)

If d(b,c)/2 >= d(x,b), then d(x,b) <= d(x,c), thus no need to calculate d(x,c).

If we know an upper bound u>=d(x,c), but u >d(b,c)/2, then we need compute d(x,b) and d(x,c)

Lemma 2: d(x,c)>=max{0, d(x,b)-d(b,c)} >=max{0, l'-d(b,c)}

where l' is a lower bound of d(x,b) in the previous iteration's calculation

The optimized algorithm is based on the fact that most distance calculations in standard K-means are redundant. If a point is far away from a center, it is not necessary to calculate the exact distance between the point and the center in order to know that the point should not be assigned to this center. Conversely, if a point is much closer to one center than to any other, calculating exact distances is not necessary to know that the point should be assigned to the first center.

Details are in the original paper about acceleration: Charles Elkan. Using the triangle inequality to accelerate k-means. In Tom Fawcett and Nina Mishra, editors, ICML, pages 147–153. AAAI Press, 2003.

Other references: Greg Hamerly, Making k-means even faster In proceedings of the 2010 SIAM international conference on data mining (SDM 2010), April 2010. pdf!

Can you add a precis of the paper & the specific answer to the OP's question in it here? At present this is more of a comment than an answer. — gung - Reinstate Monica, Jan 10 '15 at 02:48

k-means triangle inequality

1 Answers1