I have a DataFrame where some columns are columns are correlated and some are not. I want to display only the uncorrelated columns as output. can anyone help me out in solving this.I dont want to plot but display the uncorrelated column names.
Asked
Active
Viewed 58 times
0
-
Does this answer your question? [Plot correlation matrix using pandas](https://stackoverflow.com/questions/29432629/plot-correlation-matrix-using-pandas) – I'mahdi Oct 01 '21 at 13:35
-
I want to display the column names which are uncorrelated rather than plotting. – user17051608 Oct 01 '21 at 14:08
2 Answers
0
You can first compute correlation with df.corr() then find column name like below.
try this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.RandomState(0).rand(10, 10))
corr = df.corr()
# 0 1 2 3 4 5 6 7 8 9
#0 1.000000 0.347533 0.398948 0.455743 0.072914 -0.233402 -0.731222 0.477978 -0.442621 0.015185
#1 0.347533 1.000000 -0.284056 0.571003 -0.285483 0.382480 -0.362842 0.642578 0.252556 0.190047
#2 0.398948 -0.284056 1.000000 -0.523649 0.152937 -0.139176 -0.092895 0.016266 -0.434016 -0.383585
#3 0.455743 0.571003 -0.523649 1.000000 -0.225343 -0.227577 -0.481548 0.473286 0.279258 0.446650
#4 0.072914 -0.285483 0.152937 -0.225343 1.000000 -0.104438 -0.147477 -0.523283 -0.614603 -0.189916
#5 -0.233402 0.382480 -0.139176 -0.227577 -0.104438 1.000000 -0.030252 0.417640 0.205851 0.095084
#6 -0.731222 -0.362842 -0.092895 -0.481548 -0.147477 -0.030252 1.000000 -0.494440 0.381407 -0.353652
#7 0.477978 0.642578 0.016266 0.473286 -0.523283 0.417640 -0.494440 1.000000 0.375873 0.417863
#8 -0.442621 0.252556 -0.434016 0.279258 -0.614603 0.205851 0.381407 0.375873 1.000000 0.150421
#9 0.015185 0.190047 -0.383585 0.446650 -0.189916 0.095084 -0.353652 0.417863 0.150421 1.000000
threshold = 0.2
uncorr = (corr[(corr.abs() > threshold)].fillna('True').apply(lambda row: row[row == 'True'].index.tolist(), axis=1))
uncorr_df = uncorr.to_frame('col_name_uncorrelated')
# 0 with 4,9 uncorrelated
# 1 with 9 uncorrelated
...
# 9 with 0, 1, 4, 5, 8 uncorrelated
Output:
>>> uncorr_df
col_name_uncorrelated
0 [4, 9]
1 [9]
2 [4, 5, 6, 7]
3 []
4 [0, 2, 5, 6, 9]
5 [2, 4, 6, 9]
6 [2, 4, 5]
7 [2]
8 [9]
9 [0, 1, 4, 5, 8]
I'mahdi
- 11,310
- 3
- 17
- 23
0
First of all calculate the correlation:
import pandas as pd
myDataFrame=pd.DataFrame(data)
correl=myDataFrame.corr()
Define what you mean by "uncorrelated". I will use an absolute value of 0.5 here
uncor_level=0.5
The following code will give you the names of the pairs that are uncorrelated
pairs=np.full([len(correl)**2,2],None) #define an empty array to store the results
z=0
for x in range(0,len(correl)): #loop for each row(index)
for y in range(0,len(correl)): #loop for each column
if abs(correl.iloc[x,y])<uncor_level:
pair=[correl.index[x],correl.columns[y]]
pairs[z]=pair
z=z+1
Anna Pas
- 41
- 3