2

The silhouette score in my case gives quite misleading results, any alternatives?

My data is a result of embedding words, which belong to one of the 20+ classes. I want to measure the "clusteredness" of words belonging to the same label. Often times words of the same label cluster into a 2-3 well defined blobs, which is good, other times words spread out evenly on the entire plot, like background noise, which is bad. Any measurement to quantify this?

  • the "clusteredness" of observations depends also on the chosen number of clusters. Have you tried varying that? – utobi Oct 11 '22 at 14:22
  • @utobi The classes are premade, not a variable, so every word already has a label. I'm interested in how well my resulting embedding data is organized according to those premade classes. – oliver.c Oct 11 '22 at 14:39
  • It looks like you have one partition (classes) and another partition (clusters). And want to measure how much the two agree. Right? Then you need some of external clustering criteria – ttnphns Oct 11 '22 at 21:04

0 Answers0