0

Is there a way to automate the filtering process in pandas based on the columns and list of associated values the columns are supposed to take?

Please, see the example here:

import pandas as pd
import seaborn as sns

df = sns.load_dataset('tips')

columns = ['sex']
values = ['Female']

df.query(@columns in @values)

columns2 = ['sex', 'smoker']
values2 = ['Male', 'Yes']
df.query(@columns2 in @values2)

Of course, this does not work but is there a way to do so?

The point is that the solution should be generalized as the length of the lists may differ.

Petr
  • 1,164
  • 1
  • 9
  • 26

1 Answers1

2
  • you need to conform with requirements of query
  • this is straight forward with a list comprehension and f-strings
import pandas as pd
import seaborn as sns

df = sns.load_dataset('tips')

columns = ['sex']
values = ['Female']

df.query(" and ".join([f"{c}=='{values[i]}'" for i,c in enumerate(columns)]))

columns2 = ['sex', 'smoker']
values2 = ['Male', 'Yes']
df.query(" and ".join([f"{c}=='{values2[i]}'" for i,c in enumerate(columns2)]))

Rob Raymond
  • 23,623
  • 2
  • 11
  • 24
  • 1
    I feel `zip` might be more clear here `" and ".join([f"{c}=='{v}'" for c, v in zip(columns2, values2)])` but otherwise I agree this is the way to go. – Henry Ecker Jun 11 '21 at 16:33
  • `join` can also take generators, so you can omit the extra `[]`, e.g.`" and ".join(f"{c}=='{values[i]}'" for i,c in enumerate(columns))` – fsimonjetz Jun 11 '21 at 17:18