0

Say I have two different types of data recorded on 30 different samples of a medicine (as a little background - there is spectral and process data - the differences observed in the two may or may not be related). Clustering algorithms were carried out separately on each set of data which divided the samples into 5 clusters. What could I use to find out if there is any correlation between the two sets of resultant clustering categories?

I assume these means I am looking for correlation between two sets of nominal categoricals. This is not something I have experience with, I've used the CramerV test, is this appropriate? I feel this has a simple answer that I'm overlooking. Thanks!

  • Do I understand correctly that you have n cases, and have two (or more) cluster partitions for them (each in a form of a nominal variable, and cluster codes/labels can be different in the them). You want to compare the partitions how much they agree. True? – ttnphns Mar 17 '22 at 08:08
  • If I follow you correctly then yes, I think that sums it up – Betty Mar 17 '22 at 11:01
  • Then search the site and the internet for "external clustering criteria", "external cluster validity indices". Such as Adjusted Rand, F Clustering Accuracy, etc. They are different. Most important ones, with the formulas - you can find on my web-page: download "Compare partitions" and read the description of !KO_cluagree macro. – ttnphns Mar 17 '22 at 11:09
  • Glance also at https://stats.stackexchange.com/a/548869/3277, describing what is a pair confusion matrix. – ttnphns Mar 17 '22 at 11:13

0 Answers0