2

is there something out there that does multidimensional hierarchical clustering?

I have looked in these places:

but with no success so far

meaning: finding groups same way that was done with 2 dimensions, but with multiple dimensions

some code:

import pandas as pd
import numpy as np
set_of_values = pd.DataFrame(
    [np.random.rand(10),
     np.random.rand(10),
     np.random.rand(10),
     np.random.rand(10),
     np.random.rand(10),],
     index=['temp differential', 'power differential', 'cost', 'time','output'],
    columns=range(10)).transpose()
print(set_of_values)

I'd like to find all the clusters for ('temp differential', 'power differential', 'cost', 'time','output'). not graphically as it is an hyperplane. ideally with an output like [all the groups]:

GROUP #1: (a,b,c,d,e), (a',b',c',d',e'), ... , (a'',b'',c'',d'',e'') 
...
GROUP #n: ('a,'b,'c,'d,'e), ('a,'b,'c,'d,'e), ... , (''a,''b,''c,''d,''e) 

given a threshold on the progressive 'clustering'. is it doable?

Asher11
  • 1,195
  • 1
  • 11
  • 29

1 Answers1

3

Here's a quick example. Here, this is clustering 4 random variables with hierarchical clustering:

%matplotlib inline
import matplotlib.pylab as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({"col" + str(num): np.random.randn(50) for num in range(1,5)})
sns.clustermap(df)

enter image description here

If you are concerned with understanding the dendrogram linkages and thresholding to get clusters, the seaborn tool uses scipy and this post would be helpful.

If you want to visualize this in space, I'd recommend using Principal Component Analysis and plotting PC1 vs PC2 http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html

Community
  • 1
  • 1
plfrick
  • 959
  • 11
  • 11