148

Say I have the following dataframe:

table

What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?

Is this it?

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

UPDATE: What to do if I have more than a 100 columns? I don't want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).

So to be clear what my goal is:

Dividing all values by 2 of all rows that have stream 2, but not changing the stream column

jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
Stanko
  • 3,695
  • 3
  • 20
  • 49

2 Answers2

271

I think you can use loc if you need update two columns to same value:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

If you need update separate, one option is use:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

Another common option is use numpy.where:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

EDIT: If you need divide all columns without stream where condition is True, use:

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

If working with multiple conditions is possible use multiple numpy.where or numpy.select:

df0 = pd.DataFrame({'Col':[5,0,-6]})

df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing', 
                          np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))

df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
                            ['Increasing',  'Decreasing'], 
                            default='No Change')

print (df0)
   Col    New Col1    New Col2
0    5  Increasing  Increasing
1    0   No Change   No Change
2   -6  Decreasing  Decreasing
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 1
    I updated my question, I have more than 100 columns, how could I do this? – Stanko Apr 28 '16 at 09:31
  • 2
    @Stanko - I think it is another question - you need select this `100` columns some way. e.g. if need `100` first columns, use `df.columns[:100]` and then it pass to `loc`. – jezrael Apr 28 '16 at 09:32
  • I don't necessarily want the first 100 columns, I just want to divide all the values of the columns (except the stream column) by 2 where the stream is f.e. 2 – Stanko Apr 28 '16 at 09:35
  • so difference between loc and np.where is that loc changes rows that only satisfy condition but np.where has if and else statement therefore it will change all rows? – haneulkim Jun 03 '20 at 07:36
  • 1
    @Ambleu - exactly. – jezrael Jun 03 '20 at 07:36
  • You can pass a list of columns like this df1['feat'] = np.where(df1['stream'].isin([ 1, 2, 3], 10, 20) – Joe Rivera Dec 17 '20 at 17:55
  • @JoeRivera - You are close, bad closed `)` - `df1['feat'] = np.where(df1['stream'].isin([ 1, 2, 3]), 10, 20)` if need test multiple values – jezrael Dec 18 '20 at 07:49
  • It might be useful to explicitly add that you can set the value to one of more than two outcomes. I was able to change a value in a column in a data frame to one of three options (Increasing, Decreasing, No change) instead of just two. That the third parametre in a np.where() method allows another np.where method, hence to you chain logic together. Example: `df0['New Col'] = np.where((df0['Col'] > 0), 'Increasing', np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))` This was useful because my data was either positive, negative, or zero and I needed to show that inthe result – M H Jan 14 '21 at 01:27
  • 1
    @MH - good idea, added to answer. Also `np.select` alternative. – jezrael Jan 14 '21 at 07:00
3

You can do the same with .ix, like this:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

EDIT

After the extra information, the following will return all columns - where some condition is met - with halved values:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)
Ruli
  • 2,403
  • 12
  • 27
  • 35
Thanos
  • 2,204
  • 1
  • 14
  • 32
  • This is doable if I don't have a lot of columns, I should of said that I have more than 100 columns. – Stanko Apr 28 '16 at 09:30
  • I tested your last edit with `condition = (df.a == -1.001287)` expecting the the values to be divided of the row where `a == -1.001287` but I got back an empty dataframe. – Stanko Apr 28 '16 at 09:56
  • Yes, this is because this is just the display, not the real value, get the real value like this: `df.iloc[1,0]`. Or better yet set the value yourself and then try again: `df.iloc[1,0] = 1.2345; condition = df.a == 1.2345` – Thanos Apr 28 '16 at 10:00
  • I'm not following, why exactly does `condition = (df.a == -1.001287)` not work? – Stanko Apr 28 '16 at 10:19
  • @Stanko `-1.001287` is just for display, not the REAL value, you can see if you try to retrieve the value with `.iloc[]`. In other words the display is clipped. See more on this here: http://stackoverflow.com/questions/11707586/python-pandas-widen-output-display – Thanos Apr 28 '16 at 10:24
  • Ok, I got it working, but this operation returns one row. It does not keep my other rows who are not updated. – Stanko Apr 28 '16 at 10:32
  • jezrael's solution seems like the best approach right now, the indexing part for sure. You could still use either `.ix` or `.loc` – Thanos Apr 28 '16 at 10:40
  • Indeed but thank you for your help, I learned a couple of things during our discussion – Stanko Apr 28 '16 at 10:43
  • @Stanko Check out this thread as well on the precision topic, cheers – Thanos Apr 29 '16 at 10:02
  • 12
    `ix` is now deprecated. – dbliss Jul 04 '17 at 05:26