-1

I have a large data matrix X and I use SciPy implementation of Ward's hierarchical clustering like so:

Z = ward(X.todense())
fig = plt.figure(figsize=(25, 10))
dn = dendrogram(Z)

I now wish to see which classes X[i] belongs to. How can I do this?

jdoe
  • 594
  • 2
  • 17

1 Answers1

2

From the linkage matrix Z you can get the clusters with scipy.cluster.hierarchy.fcluster.

First, I assume you want the same clusters as the colors of dendrogram. From the docs we can see that the color_threshold is set to 0.7*max(Z[:,2]) if nothing else is specified. So that is what we will use.

For example:

from sklearn.datasets import make_classification
from scipy.cluster.hierarchy import linkage, fcluster
X, y = make_classification(n_samples=10)
Z = linkage(X, method='ward')
thresh = 0.7*max(Z[:,2])
fcluster(Z, thresh, criterion='distance')

See also How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

KPLauritzen
  • 1,519
  • 11
  • 20