Use a list of values to select rows from a Pandas dataframe

Question

Let’s say I have the following Pandas dataframe:

df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df

     A   B
0    5   1
1    6   2
2    3   3
3    4   5

I can subset based on a specific value:

x = df[df['A'] == 3]
x

     A   B
2    3   3

But how can I subset based on a list of values? - something like this:

list_of_values = [3,6]

y = df[df['A'] in list_of_values]

To get:

     A    B
1    6    2
2    3    3

score 1872 · Accepted Answer · edited Jun 06 '21 at 19:23

1872

You can use the isin method:

In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})

In [2]: df
Out[2]:
   A  B
0  5  1
1  6  2
2  3  3
3  4  5

In [3]: df[df['A'].isin([3, 6])]
Out[3]:
   A  B
1  6  2
2  3  3

And to get the opposite use ~:

In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
   A  B
0  5  1
3  4  5

edited Jun 06 '21 at 19:23

Peter Mortensen

30,030
21
100
124

answered Aug 23 '12 at 19:20

Wouter Overmeire

59,036
9
61
42

27

How would you return these values in the order of the list? For example, `list_of_values` has values 3 then 6 but the frame is returned with 6 then 3. I'm not talking about a simple sort, rather how specifically can we return in the order of the values in the list. – Jason Strimpel Aug 14 '14 at 17:36
1

This was an example of boolean indexing which keeps the order off the index, see http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-with-isin for more details. A sort after the selection is needed. – Wouter Overmeire Aug 18 '14 at 15:16
1

This helped me http://stackoverflow.com/a/29108799/5629831 – Philippe Remy May 25 '16 at 03:39
9

You can also achieve similar results by using 'query' and @: eg: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) list_of_values = [3,6] result= df.query("A in @list_of_values") result A B 1 6 2 2 3 3 – akuriako Sep 28 '17 at 03:05
2

@JasonStrimpel I replied to your question here: https://stackoverflow.com/questions/51944021/select-rows-of-pandas-dataframe-from-list-in-order-of-list/51944022#51944022 – syltruong Aug 21 '18 at 07:51
1

If you wanted to filter from both columns you could also do it like this: df[(~df['A'].isin([5]))&(~df['B'].isin([5]))] – sparrow Sep 19 '18 at 19:20
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#the-in-and-not-in-operators this helped me, hope this will help you as well – chaitanya Apr 09 '20 at 05:46
@JasonStrimpel I added here a solution if you want to keep the order of the list: https://stackoverflow.com/a/63918237/7183444 – Billy Bonaros Sep 16 '20 at 10:41
I have a doubt regarding the use of this function. Why does using: df[df.isin([3, 6])] returns the full dataframe with 'NaN' columns and the selected values, but df[df['A'].isin([3, 6])] returns only the selected values? – marcogemaque Nov 12 '20 at 14:45
Is there a way to get duplicates if an item appears in the list twice? For example list_of_values = [3, 6, 6] and you want to get three records in the resulting dataframe? – Nick Koprowicz Aug 19 '21 at 17:49

Mykola Zotko · Answer 2 · 2021-09-12T19:40:15.473

50

You can use the method query:

df.query('A in [6, 3]')
# df.query('A == [6, 3]')

or

lst = [6, 3]
df.query('A in @lst')
# df.query('A == @lst')

edited Sep 12 '21 at 19:40

answered May 03 '21 at 05:07

Mykola Zotko

12,250
2
39
53

3

i wonder if `query()` is computationally better than `isin()` function – Hammad Aug 29 '21 at 14:51
3

@Hammad According to Pandas docs: "DataFrame.query() using numexpr is slightly faster than Python for large frames." – Mykola Zotko Aug 30 '21 at 08:21
1

how would i combine two queries? df.query('A in @lst') AND df.query('B in @lst2') – data_runner Oct 27 '21 at 14:22
2

@data_runner `df.query('(A in @lst) and (B in @lst2)')` – Mykola Zotko Oct 27 '21 at 14:38
if the column contains strings or lists, can we slice inline in a query? for instance this wont work: `df.query('(A[:2] in @lst')` – Subham Burnwal Jan 06 '22 at 13:16
1

@SubhamBurnwal You can't use square brackets inside a query, but you can use the method `slice` instead: `df.query('A.str.slice(stop=2) in @lst')` – Mykola Zotko Jan 06 '22 at 15:52
@Mykola for me work: `df.query('(A in @lst) or (B in @lst2)')` and `df.query('A.str.slice(stop=2) in @lst')` return AttributeError: Can only use .str accessor with string values! pd.__version__:'1.3.5' – rubengavidia0x Feb 08 '22 at 00:20

score 5 · Answer 3 · edited Jun 06 '21 at 19:26

5

Another method;

df.loc[df.apply(lambda x: x.A in [3,6], axis=1)]

Unlike the isin method, this is particularly useful in determining if the list contains a function of the column A. For example, f(A) = 2*A - 5 as the function;

df.loc[df.apply(lambda x: 2*x.A-5 in [3,6], axis=1)]

It should be noted that this approach is slower than the isin method.

edited Jun 06 '21 at 19:26

Peter Mortensen

30,030
21
100
124

answered May 24 '21 at 15:08

Achintha Ihalage

1,938
3
15
29

The use of `df.loc[df.apply(lambda x: ... ` is very powerful. Did the trick for me. And yes, it is slower but more flexible :) – serfer2 Dec 06 '21 at 18:12

score -1 · Answer 4 · answered May 26 '22 at 04:34

-1

Ypu can store your values in a list as:

lis = [3,6]

then

df1 = df[df['A'].isin(lis)]

answered May 26 '22 at 04:34

dare_devils

1,575
1
13
15

What's the difference between top answer? – Ynjxsjmh May 26 '22 at 04:37
if you have more values to filter, it is better to store them as `list` and filter that `list` from the main dataframe. – dare_devils May 26 '22 at 04:42

Use a list of values to select rows from a Pandas dataframe

4 Answers4

Linked

Related