So I have these data points and I know the ground truth/labels of these points. I want to use Hierarchical clustering on the dataset given that all of the points that have the same labels are clustered together. I know this somewhat defeats the purpose of clustering where I use a clustering method on a dataset and compare my results with the ground truth. However, I want to use hierarchical clustering to determine the relationship between these predefined clusters and see how these clusters can converge. Any tips on doing this? So I tried looked into R and Matlab to see if there are any packages that allows me to do this, but it doesn't look like I can.
1 Answers
This should be easily possible, as long as you have some software which is very flexible.
I'd try ELKI, and implement my own distance function:
$$ d(x,y)=\begin{cases} 0 & \text{iff }x\text{ and }y\text{ are in the same class} \\ \varepsilon+\text{Euclidean}(x,y) & \text{otherwise} \end{cases} $$
Then hierarchical clustering should first merge all objects of the same class, and construct a hierarchy of the existing classes.
Note that hierarchical clustering scales quite badly, it's in $\mathcal{O}(n^3)$ when implemented naively (ELKI has a $\mathcal{O}(n^2)$ implementation for single-linkage). You may actually get a substantial speedup by re-implementing hierarchical clustering yourself for your particular use case; by starting at the point where the same-label clusters have already been formed.
- 42,358
to determine the relationship between these predefined clusters? – ttnphns Nov 27 '13 at 11:13