1

How can I change the legend as we can see now the legend has some cluster numbers missing. How can I adjust the legend so that it can show all the cluster numbers (such as Cluster 1, Cluster 2 etc, no it's only 0 3 6 9)? (codes I followed this link: Perform k-means clustering over multiple columns)

kmeans = KMeans(n_clusters=10)
y2 = kmeans.fit_predict(scaled_data)

reduced_scaled_data = PCA(n_components=2).fit_transform(scaled_data)
results = pd.DataFrame(reduced_scaled_data,columns=['pca1','pca2'])
sns.scatterplot(x="pca1", y="pca2", hue=y2, data=results)
#y2 is my cluster number

plt.title('K-means Clustering with 2 dimensions')
plt.show()

enter image description here

Edit: the legend seem not the same, the cluster 0 should be the lightest color.

enter image description here

enter image description here

Cecilia
  • 173
  • 6

1 Answers1

0

Yeah this is an annoying weirdness of seaborn. Just pass legend='full' as a parameter to sns.scatterplot(). So your code becomes:

kmeans = KMeans(n_clusters=10)
y2 = kmeans.fit_predict(scaled_data)

reduced_scaled_data = PCA(n_components=2).fit_transform(scaled_data) results = pd.DataFrame(reduced_scaled_data,columns=['pca1','pca2']) sns.scatterplot(x="pca1", y="pca2", hue=y2, data=results, legend='full') #y2 is my cluster number

plt.title('K-means Clustering with 2 dimensions') plt.show()

I have no idea why you'd ever want half a legend! But the parameter defaults to 'brief'. Refer to the seaborn documentation:

legend : “brief”, “full”, or False, optional

How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If False, no legend data is added and no legend is drawn.

edit: To have the clusters named in the format you're after, first create a list in the right format, and then call plt.legend() and pass that list as an argument to legend():

legend = []
for i in np.unique(y2):
    legend.append('Cluster {0}'.format(i))

sns.scatterplot(x="pca1", y="pca2", hue=y2, data=results, legend='full') #y2 is my cluster number

plt.title('K-means Clustering with 2 dimensions') plt.legend(legend) plt.show()

To change the location of the legend, use the loc= parameter in the plt.legend() call. For example plt.legend(legend, loc='upper right') or plt.legend(legend, loc='upper left')

Dan Scally
  • 1,754
  • 7
  • 25
  • Hi thanks now I get all the cluster number, but I want to custom them from '0,1,2,3...'to 'Cluster 0, Cluster 1...', and also the legend covers part of the plot, I might need to make the plot bigger, do you mind giving me some hints? Thanks again. Note that y2=[0,1,2,3...] @DanScally – Cecilia Aug 02 '19 at 08:21
  • @Cecilia that's reasonably easy too; see my edit. – Dan Scally Aug 02 '19 at 08:25
  • Hi thank you soooo much! But now the legend is too large and cover my data points, I tried to change 'size_norm' but nothing happened, is there a way that I can make the legend smaller or make the whole plot larger? – Cecilia Aug 02 '19 at 08:33
  • @Cecilia you can use the loc parameter when calling plt.legend() to control where it goes. Passing plt.legend(legend, loc='best') will make it try to deconflict. Alternatively you can call fig = plt.figure(figsize=(36,12)) BEFORE sns.scatterplot() to change the size of the whole plot. There might also be a way to control the size of the legend itself; try the plt.legend() documentation: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html – Dan Scally Aug 02 '19 at 08:36
  • Thank you so much, I've solved it, but I compared the adjusted legend, the first cluster-Cluster 0 should be the one with the lightest colour, but it's the darkest one now, not sure why this happens, please see my edit. Thanks again. – Cecilia Aug 02 '19 at 08:43
  • @Cecilia oh yeah, that is weird. Not sure I'm afraid. I'll take a look at lunchtime – Dan Scally Aug 02 '19 at 08:59
  • Thanks a lot Dan:) – Cecilia Aug 02 '19 at 09:01
  • Hey just wondering if you come up with any idea? I tried to solve it but still have no idea... – Cecilia Aug 02 '19 at 13:55