Pandas query based on the list of columns and values

Question

Is there a way to automate the filtering process in pandas based on the columns and list of associated values the columns are supposed to take?

Please, see the example here:

import pandas as pd
import seaborn as sns

df = sns.load_dataset('tips')

columns = ['sex']
values = ['Female']

df.query(@columns in @values)

columns2 = ['sex', 'smoker']
values2 = ['Male', 'Yes']
df.query(@columns2 in @values2)

Of course, this does not work but is there a way to do so?

The point is that the solution should be generalized as the length of the lists may differ.

I think you're looking for [`loc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) — fsimonjetz, Jun 11 '21 at 16:04
`df.query("sex=='Female'")` and `df.query("sex=='Male' and smoker=='Yes'")` — Anurag Dabas, Jun 11 '21 at 16:07
Hi all, no I know about filtering "by hand" but I need one universal solution. — Petr, Jun 11 '21 at 16:09

score 2 · Answer 1 · answered Jun 11 '21 at 16:25

2

you need to conform with requirements of query
this is straight forward with a list comprehension and f-strings

import pandas as pd
import seaborn as sns

df = sns.load_dataset('tips')

columns = ['sex']
values = ['Female']

df.query(" and ".join([f"{c}=='{values[i]}'" for i,c in enumerate(columns)]))

columns2 = ['sex', 'smoker']
values2 = ['Male', 'Yes']
df.query(" and ".join([f"{c}=='{values2[i]}'" for i,c in enumerate(columns2)]))

answered Jun 11 '21 at 16:25

Rob Raymond

23,623
2
11
24

1

I feel `zip` might be more clear here `" and ".join([f"{c}=='{v}'" for c, v in zip(columns2, values2)])` but otherwise I agree this is the way to go. – Henry Ecker Jun 11 '21 at 16:33
`join` can also take generators, so you can omit the extra `[]`, e.g.`" and ".join(f"{c}=='{values[i]}'" for i,c in enumerate(columns))` – fsimonjetz Jun 11 '21 at 17:18

Pandas query based on the list of columns and values

1 Answers1