Removing matching index values from dataframe

Question

df:

         0         1         2 
0 0.0481948 0.1054251 0.1153076 
1 0.0407258 0.0890868 0.0974378 
2 0.0172071 0.0376403 0.0411687
etc.

I would like to remove all values in which the x and y titles/values of the dataframe are equal, therefore, my expected output would be something like:

         0         1         2 
0 NaN       0.1054251 0.1153076 
1 0.0407258 NaN       0.0974378 
2 0.0172071 0.0376403 NaN
etc.

As shown, the values of (0,0), (1,1), (2,2) and so on, have been removed/replaced.

I thought of looping through the index as followed:

for (idx, row) in df.iterrows():
    if (row.index) == ???

But don't know where to carry on or whether it's even the right approach

score 4 · Accepted Answer · answered Oct 25 '17 at 22:27

4

You can set the diagonal:

In [11]: df.iloc[[np.arange(len(df))] * 2] = np.nan

In [12]: df
Out[12]:
          0         1         2
0       NaN  0.105425  0.115308
1  0.040726       NaN  0.097438
2  0.017207  0.037640       NaN

answered Oct 25 '17 at 22:27

Andy Hayden

328,850
93
598
514

Thank You. Why `* 2`? :) – ThatOneNoob Oct 25 '17 at 22:29
1

@LearningToPython the * 2 is because what you really want is `df.iloc[[np.arange(3), np.arange(3)]]` and this saves a little bit of typing! – Andy Hayden Oct 25 '17 at 22:36
Okay, Thanks! I'm a newbie, how would `[np.arange(3), np.arange(3)]` work? ;/ Sorry – ThatOneNoob Oct 25 '17 at 22:41
@LearningToPython it's fancy indexing, and one of the great things in pandas/numpy see https://pandas.pydata.org/pandas-docs/stable/indexing.html – Andy Hayden Oct 25 '17 at 22:43
Seems complicated ;-) I'll try understand. Thanks again ! – ThatOneNoob Oct 25 '17 at 22:44

score 2 · Answer 2 · answered Oct 25 '17 at 23:39

@AndyHayden's answer is really cool and taught me something. However, it depends on iloc and that the array is square and that everything is in the same order.

I generalized the concept here

Consider the data frame df

df = pd.DataFrame(1, list('abcd'), list('xcya'))

df

   x  c  y  a
a  1  1  1  1
b  1  1  1  1
c  1  1  1  1
d  1  1  1  1

Then we use numpy broadcasting and np.where to perform the same fancy index assignment:

ij = np.where(df.index.values[:, None] == df.columns.values)

df.iloc[list(map(list, ij))] = 0

df

   x  c  y  a
a  1  1  1  0
b  1  1  1  1
c  1  0  1  1
d  1  1  1  1

I am surprised that this `df.loc[[(df.index & df.columns)] * 2]` or a variant doesn't work :/ — Andy Hayden, Oct 26 '17 at 01:32

score 0 · Answer 3 · answered Oct 25 '17 at 22:26

0

n is number of rows/columns

df.values[[np.arange(n)]*2] = np.nan

or

np.fill_diagonal(df.values, np.nan)

see https://stackoverflow.com/a/24475214/

answered Oct 25 '17 at 22:26

ehacinom

6,550
5
37
60

Oh, Great. Why `* 2`? – ThatOneNoob Oct 25 '17 at 22:28
I'm referring to [np.arange(n)]*2] ;-) by the way – ThatOneNoob Oct 25 '17 at 22:36
Note: fill_diagonal won't always works, as sometimes df.values will be a copy. In fact same is true in both your examples. E.g. this happens for mixed data types. – Andy Hayden Oct 25 '17 at 22:39
For example, do `df.iloc[0, 2] = 'a'` before trying. – Andy Hayden Oct 25 '17 at 22:40

Removing matching index values from dataframe

3 Answers3