62

I have a dataframe like this:

In[1]: df
Out[1]:
      A      B       C            D
1   blue    red    square        NaN
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN

and I want to update column D when it meets 3 conditions. Ex:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='square'), ['D'] ] = 'succeed'

It works for the first two conditions, but it doesn't work for the third, thus:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='triangle'), ['D'] ] = 'succeed'

has exactly the same result:

In[1]: df
Out[1]:
      A      B       C            D
1   blue    red    square        succeed
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN
Timur Shtatland
  • 9,559
  • 2
  • 24
  • 32
Eduardo Oliveira
  • 631
  • 1
  • 6
  • 4

5 Answers5

77

Using:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'

gives the warning:

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

A better way of achieving this seems to be:

df.loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square'),'D'] = 'M5'
Ena
  • 3,231
  • 33
  • 31
Praveen
  • 2,077
  • 15
  • 20
  • 4
    df.loc is much faster way than the previous one. – hodophile Apr 19 '18 at 18:36
  • 1
    This is a very descriptive and useful answer, Thanks for that – TapanHP Jun 20 '19 at 12:12
  • 1
    This is the idiomatic solution. Another option is to use a callable inside `loc` like so: `df.loc[lambda x: (x[‘A’] == ‘blue’) & (x[‘B’] == ‘red’) & (x[‘C’] == ‘square’), ‘D’] = ‘succeed’]`. Remember: `df.loc[row_mask, cols] = assigned_val` – jonchar Aug 04 '19 at 01:50
  • This solution seems to be the standard solution but it takes forever. In my case, the assignment operation never gets finished. Is it possible to speed up this operation? – Arul Oct 13 '21 at 15:30
28

You could try this instead:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'
Tim
  • 1,866
  • 3
  • 24
  • 40
  • Yes, it works. Though I still don't get the difference between "numpy.logical_and" and "&". Thank you – Eduardo Oliveira Jan 21 '14 at 16:30
  • also you can use `or` in a query like `df[ (df.A=='blue') | (df.B=='red') ]` – Tim Jan 21 '14 at 17:11
  • 27
    If getting warning `A value is trying to be set on a copy of a slice from a DataFrame.` while using above solution then do this: `df.loc[ (df.A=='blue') & (df.B=='red') & (df.C=='square'), 'D'] = 'succeed'` – Shams Mar 15 '17 at 05:57
  • 5
    Just as a side note to anyone who runs into this and thinks they're going to trim it down...you MUST have the parentheses around the conditions or operator precedence will cause issues. Voice of experience. :-P – Brent Writes Code Apr 02 '18 at 02:45
6

You could try:

df['D'] = np.where((df.A=='blue') & (df.B=='red') & (df.C=='square'), 'succeed')

This answer might provide a detailed answer to the your question: Update row values where certain condition is met in pandas

Aryan Firouzian
  • 1,862
  • 5
  • 29
  • 37
theSanjeev
  • 149
  • 2
  • 10
4

This format might have been implied in the new answers, but the following bit actually worked for me.

df['D'].loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square')] = 'succeed'

2

The third parameter of logical_and is to assign the array used to store the result.

Currently, the method @TimRich provided might be the best. In pandas 0.13 (in development), there's a new experimental query method. Try it!

waitingkuo
  • 80,738
  • 23
  • 108
  • 117