4

I have a dictionary containing several pandas masks as strings for a specific dataframe, but I can't find a way to use those masks.

Here is a short reproducible example :

df = pd.DataFrame({'age' : [10, 24, 35, 67], 'strength' : [0 , 3, 9, 4]})

masks = {'old_strong' : "(df['age'] >18) & (df['strength'] >5)",
        'young_weak' : "(df['age'] <18) & (df['strength'] <5)"}

And I would like to do something like :

df[masks['young_weak']]

But since the mask is a string I get the error

KeyError: "(df['age'] <18) & (df['strength] <5)"
smci
  • 29,564
  • 18
  • 109
  • 144
vlemaistre
  • 3,256
  • 12
  • 28

3 Answers3

5

Use DataFrame.query with changed dictionary:

masks = {'old_strong' : "(age >18) & (strength >5)",
        'young_weak' : "(age <18) & (strength <5)"}

print (df.query(masks['young_weak']))
   age  strength
0   10         0
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • Nice one ! But do you know if there is a way to do it without changing the dict, or by changing the dict to transform the strings into masks ? – vlemaistre Jun 05 '19 at 08:37
  • @vlemaistre - Unfortunately here is necessary change dictionary. – jezrael Jun 05 '19 at 08:38
  • Thanks for your answer @jezrael, I'll accept the one with eval() (even though it's less clean that your solution) because it was more of what I was looking for. But this helped me too :) – vlemaistre Jun 05 '19 at 08:45
  • @vlemaistre - yes, it is up to you - [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) – jezrael Jun 05 '19 at 08:46
  • 3
    Just for reference.... `.query` is just using `pd.eval` under the hood anyway according to the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html)... – Chris Adams Jun 05 '19 at 08:46
  • @ChrisA - ya, but there are some check, because if change `eval` to `pd.eval` it failed – jezrael Jun 05 '19 at 08:47
  • @jezrael Your answer is much better that mine, but since his requirements only can be fulfilled with `eval`, i used it, but your answer is very good, you deserve the vote score you have now – U12-Forward Jun 05 '19 at 08:51
  • 2
    @U9-Forward - yop, maybe you can change your answer like it is posible, but strongly recomended dont do it. ;) – jezrael Jun 05 '19 at 08:53
  • 1
    @jezrael Added a little more to it – U12-Forward Jun 05 '19 at 08:55
0

Another way is to set up the masks as functions (lambda expressions) instead of strings. This works:

masks = {'old_strong' : lambda row: (row['age'] >18) & (row['strength'] >5),
    'young_weak' :  lambda row: (row['age'] <18) & (row['strength'] <5)}
df[masks['young_weak']]
Itamar Mushkin
  • 2,692
  • 2
  • 14
  • 30
0

Unsafe solution though, and very bad practice, but the only way to solve it is to use eval:

print(df[eval(masks['young_weak'])])

Output:

   age  strength
0   10         0

Here is the link to the reason it's bad.

U12-Forward
  • 65,118
  • 12
  • 70
  • 89