Iterating over all columns of dataframe to find list of strings

Question

Suppose I have the following df:

df = pd.DataFrame({
    'col1':['x1','x2','x3'],
    'col2':['y1','y2','y3'],
    'col3':['z1','z2','z3'],
    'col4':['a1','b2','c3']
})

and a list of elements:

l = ['x1','x2','y3']

I want to search elements of l in all the columns of my df, as it stands from my list x1 and x2 appear in col1 and y3 is in col2, so I did:

df.loc[df['col1'].apply(lambda x: True if any(i in x for i in l) else False)|
       df['col2'].apply(lambda x: True if any(i in x for i in l) else False)]

which gives me

  col1 col2 col3 col4
0   x1   y1   z1   a1
1   x2   y2   z2   b2
2   x3   y3   z3   c3

as desired but the above method needs me to make a | operator for each column. So I wonder how can I do this iteration over all columns efficiently without using | for every column?

richardec · Accepted Answer · 2022-02-02T00:17:04.003

2

A much, much more efficient way of doing this would be to use numpy broadcasting.

row_mask = (df.to_numpy() == l[:, None, None]).sum(axis=0).any(axis=1)
filtered = df[row_mask]

Output:

>>> filtered
  col1 col2 col3 col4
0   x1   y1   z1   a1
1   x2   y2   z2   b2
2   x3   y3   z3   c3

edited Feb 02 '22 at 00:17

answered Feb 02 '22 at 00:07

richardec

14,202
6
23
49

Iterating over all columns of dataframe to find list of strings

1 Answers1