Which clustering methodologies are likely to be best for this data?

Question

I'm using the classic "use-case" example of clustering pixels in a photograph. I've tried K-means, agglomerative clustering, and DBSCAN. When I plot the RGB coordinates in 3-D space, all 3 techniques return variations of a similar clustering scheme. As an example, here is how K-means clustered the data:

What I'd like to see, if possible, is a clustering scheme that adheres more closely to what a human observer would identify as the "natural" clusters evident in the dataspace. Here's an example (note that this is the same data, where the color of each pixel represents its actual RGB value in the photograph):

I'm trying to capture these long, narrower striations that I'm seeing.

Any advice would be much appreciated!

I don't believe the best way to think about this is that there will just be some algorithm that gives you what you want. First, think about your problem. What distance metric makes sense for this problem? Then you can either use a method that can operate over the distance matrix, or you transform the data to make euclidean distance in the transformed space correspond to the appropriate distance in the original space. — gung - Reinstate Monica, May 13 '22 at 20:35
I'd give dbscan a shot. Choice of parameters might be tricky but it may well be possible to find good ones. — Christian Hennig, May 13 '22 at 21:18
@gung - Reinstate Monica - This is a helpful suggestion, thanks. The problem is that I don't know what distance metric would be most appropriate. Cosine distance seems reasonable at first glance, since I'm trying to capture clusters that are long and narrow and that generally project out from a vertex, but I'm still having trouble getting it right. What would be your first few ways of approaching this? How might you transform the data, choose a distance metric, or select a clustering algorithm given this particular data? — NaiveBae, May 14 '22 at 22:31
Some general guidelines to select a clustering method https://stats.stackexchange.com/q/195456/3277 — ttnphns, May 16 '22 at 16:18
If your real data is what on the 2nd pic, we see that the clusters tend to be oblongly parallel, and skewed samely. This is not very difficult case. Try to find this principal direction (or even manually rotate the cloud for that) along which to transform the distributions to a greater symmetry. After that, even k-means could be applicable with success. — ttnphns, May 16 '22 at 16:33

Which clustering methodologies are likely to be best for this data?

0 Answers0