214

I am filtering rows in a dataframe by values in two columns.

For some reason the OR operator behaves like I would expect AND operator to behave and vice versa.

My test code:

import pandas as pd

df = pd.DataFrame({'a': range(5), 'b': range(5) })

# let's insert some -1 values
df['a'][1] = -1
df['b'][1] = -1
df['a'][3] = -1
df['b'][4] = -1

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a != -1) | (df.b != -1)]

print pd.concat([df, df1, df2], axis=1,
                keys = [ 'original df', 'using AND (&)', 'using OR (|)',])

And the result:

      original df      using AND (&)      using OR (|)    
             a  b              a   b             a   b
0            0  0              0   0             0   0
1           -1 -1            NaN NaN           NaN NaN
2            2  2              2   2             2   2
3           -1  3            NaN NaN            -1   3
4            4 -1            NaN NaN             4  -1

[5 rows x 6 columns]

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them. I would expect exactly the opposite result. Could anyone explain this behavior, please?

I am using pandas 0.13.1.

Wojciech Walczak
  • 3,130
  • 2
  • 23
  • 23
  • 1
    `df.query` and `pd.eval` seem like good fits for this use case. For information on the `pd.eval()` family of functions, their features and use cases, please visit [Dynamic Expression Evaluation in pandas using pd.eval()](https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval). – cs95 Dec 16 '18 at 04:55

3 Answers3

328

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.

DSM
  • 319,184
  • 61
  • 566
  • 472
  • 31
    `DataFrame.query()` works nicely here too. `df.query('a != -1 or b != -1')`. – Phillip Cloud Mar 23 '14 at 15:34
  • 8
    Happen to know why pandas wants `&` and `|` over `and` and `or`? – stoves Jan 31 '17 at 02:26
  • 6
    @stoves: in normal Python code, `and` and `or` have basic Python semantics that can't be modified. `&` and `|`, on the other hand, have corresponding special methods which control their behaviour. (In query strings, of course, we're free to apply any parsing we like.) – DSM Jan 31 '17 at 03:02
  • 1
    interestingly, it seems like `df[True & False]` fails but `df[(True) & (False)]` succeeds (not tested on this example) – 3pitt Feb 15 '18 at 20:26
  • 4
    Would it be possible to break this kind of syntax across multiple lines? What would be most PEP8? – tommy.carstensen Aug 31 '18 at 22:06
64

You can also use query(), i.e.:

df_filtered = df.query('a == 4 & b != 2')
Pedro Lobito
  • 85,689
  • 29
  • 230
  • 253
  • I have a situation where I think this syntax makes more sense e.g.: df.query(''(a==4 & b!=2) | c== 3") – Aus_10 Feb 28 '20 at 00:43
  • As `&` has a higher [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) than `==`, is it special for `.query()` not to interpret the boolean as something like `a == (4 & b)` ? – SeaBean Jun 21 '21 at 07:41
17

A little mathematical logic theory here:

"NOT a AND NOT b" is the same as "NOT (a OR b)", so:

"a NOT -1 AND b NOT -1" is equivalent of "NOT (a is -1 OR b is -1)", which is opposite (Complement) of "(a is -1 OR b is -1)".

So if you want exact opposite result, df1 and df2 should be as below:

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a == -1) | (df.b == -1)]
automa7
  • 456
  • 4
  • 15
Jake
  • 1,319
  • 1
  • 10
  • 12