3

I have a heatmap and I would like to find some rectangles. enter image description here

I have already used clustermap. But here, I can not calculate these rectangles. The order of the data should not be changed.

This Code is not working:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
tips2 = pd.read_csv('heatmapExport.csv')
sns.set(color_codes=True)
g = sns.clustermap(tips2, cbar="false")
plt.show()

Does somebody have an idea?

Thanks a lot

Here is an example of what I want:

enter image description here

bli
  • 3,130
  • 2
  • 15
  • 36
  • 1
    What is pd, pandas? And sns? What are the reasons to select those rectangles and not others (without the reasons it will be hard to find a solution)? Why do you say the code is not working, how does it fail (which error, what is the output compared to the expected output...) ? – llrs Oct 18 '17 at 07:15
  • Yes, pd is pandas. I want to calculate the sum of the rectangles. I have a 2d array with float values. In this array I would like to find clusters. Optical are these clusters to be recognized in the heatmap. It would be good now if I had an algorithm that recognizes these figures and calculates the sum in the clusters. – Johannes Schätzl Oct 18 '17 at 07:42
  • It is therefore important that the points are not viewed independently of each other. – Johannes Schätzl Oct 18 '17 at 08:01
  • Sorry, but I don't understand. Do you want an algorithm that for the manually selected rectangles sum the values? Why don't you calculate it manually selecting the positions of the rectangles? BTW, what is the purpose of this/the biological question ? – llrs Oct 18 '17 at 08:30
  • Okay, I think I've said this wrong. Step 1: The algorithm is to form clusters. Step 2: The recheck should be entered into the heatmap. Step 3: Calculate the sum of the rectangles. – Johannes Schätzl Oct 18 '17 at 08:42
  • 1
    What is "the recheck"? How do you define the rectangles?A rectangle is a cluster of samples and genes? – llrs Oct 18 '17 at 08:57
  • Oh sorry, recheck = rectangle... Sorry typo! – Johannes Schätzl Oct 18 '17 at 09:17
  • A rectangle is to form the cluster in which n * m data points are located – Johannes Schätzl Oct 18 '17 at 09:19
  • 1
    Can you give us a Minimal, Complete, and Verifiable example with some sample data, the rules to find the cluster ('recognizing optically' is not exact enough), and the expected results? – BioGeek Oct 18 '17 at 14:37
  • See the edited question above – Johannes Schätzl Oct 19 '17 at 08:32
  • 1
    Your example says "Cluster (for example)". Did you have criteria in mind to delimit the rectangles this way? Has the delimitation of rectangles an influence on the delimitation of others (for instance, by forbidding overlaps)? I think your question amounts to being able to formalize an intuition you have about how to form those rectangles. – bli Oct 19 '17 at 08:46
  • 1
    Also, edit your question to add an explanation of what you mean by "This Code is not working". That's an important point. – bli Oct 19 '17 at 08:49
  • I agree with @bli, If you could explain how do you want to make those clusters something could be done, but without knowing how do you want them we can't help you :( – llrs Oct 19 '17 at 10:24
  • I do not exactly know how to Form this clusters. This is one part of my Question. The criteria would be to find a Minimum number of clusters that "fit" the I mean if you look at the heatmap. You can See this clusters without having any criteria. This is what i Want i do. But the Algorithm can Not Look at the points Independent of each other. – Johannes Schätzl Oct 19 '17 at 12:34
  • Does anyone have an idea how to deal with the problem? – Johannes Schätzl Oct 23 '17 at 09:25

1 Answers1

3

Based on your description I think you should have a look at a technique called 'biclustering'.

The example on this page defines the goal of this technique as 'Finding subgroups of rows and columns which are as similar as possible to each other and as different as possible to the rest.'

Since your examples are python-based, you could check out scikit-learn's implementations of biclustering.

holmrenser
  • 445
  • 3
  • 10
  • Many thanks for your response. I have implemented the algorithm as described. He does (almost) what I want. Now I have only one problem: I would not re-sort the rows and columns. For example, the algorithm says: Cluster 1 [line 1,3,4; Columns 1, 2, 3] Cluster 2 [line 2; Column 4]. I would like to know the result, however, the other way around. Lines 1.2 Columns 1 -> cluster 1; Lines 3.4 columns 2,3 -> clusters 2, ... etc. At all, it is important that the different Clusters Switch between rows and cols. If one Cluster has finished, no Datapoint should be add to this cluster – Johannes Schätzl Oct 29 '17 at 13:35
  • I would second scitest-learn as the approach – M__ Jan 30 '19 at 18:13