1

What clustering methods are suitable for weighted graphs, where the weights cannot be interpreted as a metric ? (e.g. they do not respect the triangle inequality). At the moment I found Markov clustering and affinity propagation. Are there others ?

Intuitevely e.g. single linkage clustering has issues because if e.g. we have two clusters connected by an edge of weight 0.3, we could have a path of length two and weight 0.1+0.1=0.2<0.3 shortcutting it, losing the intuition behind the methodology ( I am considering here the case where the weights constitute a dissimilarity measure ).

K-Means, DBSCAN ( and his brother HDBSCAN ) intuitevely to me seem also to rely on metric properties, but I am not sure and cannot find a clear discussion at the moment. I am not sure my understanding is deep enough to drive conclusions.

Thomas
  • 880
  • 4
  • 16
  • I could not understand your case with the problematic single linkage. Can you explain with more detail what's wrong with it. – ttnphns Apr 25 '22 at 07:08
  • Mmm... that is just an example not the core of the question. – Thomas Apr 25 '22 at 11:46
  • Having said this, the point I was raising is that single linkage considers edges between the closest point to determine the distance between the clusters. If our weights respect the triangle inequality and two points p1 and p2 are connected by an edge of weight W, this means that for any other path connecting p1 and p2 the sum of the weights is higher W. So edges to me in this case have a stronger meaning of tightness. Ifwe donot have adistance, "edges" can be shortcut by paths of lengths greater than2, soI thought the clusteringmade considering connectivity via single edges made less "sense". – Thomas Apr 25 '22 at 11:50
  • But again these are just my thoughts, that is why I made a question... – Thomas Apr 25 '22 at 11:52
  • Why not do hierarchical clustering by, say, single, complete or average method (https://stats.stackexchange.com/a/217742/3277) based on the input distance matrix where a distance between each pair of vertices is the shortest path length between them (rather than their immediate distance)? Obtain the shortest paths matrix by Floyd-Warshall. – ttnphns Apr 25 '22 at 16:49
  • Sounds reasonable why not ? This way one would get a distance matrix starting from a general unweighted graph ,and than one could apply any clustering technique requiring a metric. Is it often done in practice ? – Thomas Apr 25 '22 at 19:11
  • I think of "often enough", because the linkage methods I cited are generally considered suitable for any (dis)similarity measures; therefore why not for shortest paths in a graph? – ttnphns Apr 25 '22 at 21:49
  • Thanks. I think the idea of deriving this distance from a weighted graph is nice. It would be great to have backed it up by (the link to) some examples/comparison w.r.t. other methods... I could not find much googling but maybe I googled wrongly... – Thomas Apr 26 '22 at 07:20

0 Answers0