2

df:

         0         1         2 
0 0.0481948 0.1054251 0.1153076 
1 0.0407258 0.0890868 0.0974378 
2 0.0172071 0.0376403 0.0411687
etc.

I would like to remove all values in which the x and y titles/values of the dataframe are equal, therefore, my expected output would be something like:

         0         1         2 
0 NaN       0.1054251 0.1153076 
1 0.0407258 NaN       0.0974378 
2 0.0172071 0.0376403 NaN
etc.

As shown, the values of (0,0), (1,1), (2,2) and so on, have been removed/replaced.

I thought of looping through the index as followed:

for (idx, row) in df.iterrows():
    if (row.index) == ???

But don't know where to carry on or whether it's even the right approach

ThatOneNoob
  • 3,332
  • 6
  • 25
  • 45

3 Answers3

4

You can set the diagonal:

In [11]: df.iloc[[np.arange(len(df))] * 2] = np.nan

In [12]: df
Out[12]:
          0         1         2
0       NaN  0.105425  0.115308
1  0.040726       NaN  0.097438
2  0.017207  0.037640       NaN
Andy Hayden
  • 328,850
  • 93
  • 598
  • 514
2

@AndyHayden's answer is really cool and taught me something. However, it depends on iloc and that the array is square and that everything is in the same order.

I generalized the concept here

Consider the data frame df

df = pd.DataFrame(1, list('abcd'), list('xcya'))

df

   x  c  y  a
a  1  1  1  1
b  1  1  1  1
c  1  1  1  1
d  1  1  1  1

Then we use numpy broadcasting and np.where to perform the same fancy index assignment:

ij = np.where(df.index.values[:, None] == df.columns.values)

df.iloc[list(map(list, ij))] = 0

df

   x  c  y  a
a  1  1  1  0
b  1  1  1  1
c  1  0  1  1
d  1  1  1  1
piRSquared
  • 265,629
  • 48
  • 427
  • 571
0

n is number of rows/columns

df.values[[np.arange(n)]*2] = np.nan

or

np.fill_diagonal(df.values, np.nan)

see https://stackoverflow.com/a/24475214/

ehacinom
  • 6,550
  • 5
  • 37
  • 60