1

I'm going over the Agglomerative Clustering algorithm in sklearn.cluster.AgglomerativeClustering. It supports four linkage methods:

  • Ward minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach and in this sense is similar to the k-means objective function but tackled with an agglomerative hierarchical approach.
  • Maximum or complete linkage minimizes the maximum distance between observations of pairs of clusters.
  • Average linkage minimizes the average of the distances between all observations of pairs of clusters.
  • Single linkage minimizes the distance between the closest observations of pairs of clusters.

As far as I understand these four methods seem to be deterministic, i.e., there is no need to run either of them over many iterations to "converge" to a stable solution.

But Wikipedia's Hierarchical clustering article states that:

Except for the special case of single-linkage, none of the algorithms (except exhaustive search in $O(2^{n}))$ can be guaranteed to find the optimum solution.

This confuses me. Does this mean that there is a stochastic process involved in either of these methods?

Gabriel
  • 4,282

1 Answers1

1

Even deterministic optimization algorithms are often only approximate. (And clustering algorithms are optimization algorithms, since they try to minimize or maximize some objective function, typically a measure of impurity.)

In some cases, no algorithms are known that are guaranteed to reach an optimal solution in feasible time for large instances, so we are forced to use approximations. When we are lucky, we at least have guarantees on the approximation error.

So: no, being nonstochastic is not a sufficient condition for an algorithm to always yield an optimal result.

Stephan Kolassa
  • 123,354
  • So, if I'm understanding correctly although these methods are nonstochastic they will still converge to different solutions for multiple runs due to being approximate? Is this correct? – Gabriel Apr 24 '20 at 16:43
  • 2
    No, given the same starting point, they will always converge to the same optimum if they are nonstochastic. It's just that this optimum may only be a local one, not a global one. It's often a good idea to run optimization algorithms, whether stochastic or not, multiple times with different starting values. – Stephan Kolassa Apr 25 '20 at 07:35