4

I would like to delete rows that contain only values that are less than 10 and greater than 25. My sample dataframe will look like this:

a   b   c  
1   2   3  
4   5   16  
11  24  22  
26  50  65  

Expected Output:

a   b   c  
1   2   3  
4   5   16   
26  50  65  

So if the row contains any value less than 10 or greater than 25, then the row will stay in dataframe, otherwise, it needs to be dropped.

Is there any way I can achieve this with Pandas instead of iterating through all the rows?

thisisbhavin
  • 314
  • 3
  • 13
Jaswanth Kumar
  • 3,321
  • 3
  • 22
  • 25
  • Good luck. I hope you get your rows taken care of. – Mad Physicist Jun 14 '17 at 18:52
  • 1
    Just as an FYI, this is not how you ask a question on SO. You should show some minimal effort on your part. If you have the resources to ask a question here, you have the resources to do a little searching yourself. – Mad Physicist Jun 14 '17 at 18:58
  • why are the first and last rows still in your expected output? this doesn't make any sense. – Paul H Jun 14 '17 at 19:02
  • Can you explain your deletion criteria better? Do you drop a row if **any** of the values are less than 10 and greater than 25? or **all** of the values? The example input and output doesn't make sense – Rakesh Adhikesavan Jun 14 '17 at 19:06
  • If the row has any one value which is less than 10 or greater than 25, then the row will stay in dataframe else it needs to be dropped. – Jaswanth Kumar Jun 14 '17 at 19:13
  • @PaulH Sorry for the confusion..just updated the question. – Jaswanth Kumar Jun 14 '17 at 19:17
  • It's clarified but the end of the question now contradicts the beginning. – Paul H Jun 14 '17 at 19:36

2 Answers2

6

You can call apply and return the results to a new column called 'Keep'. You can then use this column to drop rows that you don't need.

import pandas as pd
l = [[1,2,3],[4,5,6],[11,24,22],[26,50,65]]
df = pd.DataFrame(l, columns = ['a','b','c']) #Set up sample dataFrame

df['keep'] = df.apply(lambda row: sum(any([(x < 10) or (x > 25) for x in row])), axis = 1)

The any() function returns a generator. Calling sum(generator) simply returns the sum of all the results stored in the generator.

Check this on how any() works. Apply function still iterates over all the rows like a for loop, but the code looks cleaner this way. I cannot think of a way to do this without iterating over all the rows.

Output:

    a   b   c  keep
0   1   2   3     1
1   4   5   6     1
2  11  24  22     0
3  26  50  65     1


df = df[df['keep'] == 1] #Drop unwanted rows
Rakesh Adhikesavan
  • 10,627
  • 16
  • 49
  • 71
  • Great answer! I'm new to pandas and was wondering if this is more efficient to "just loop" over the dataframe. I don't know how panda's dataframe is implemented under the hood but this seems like iterating over it twice. Could you please point me to some explanation on that? – mimre Jun 14 '17 at 19:56
  • Apply function is just syntactic sugar for a loop. Even if you wrote a for loop to iterate over all rows, I am not sure if there is a way to drop rows on the fly. – Rakesh Adhikesavan Jun 14 '17 at 20:14
4

You can use pandas boolean indexing

dropped_df = df.loc[((df<10) | (df>25)).any(1)]
  • df<10 will return a boolean df
  • | is the OR operator
  • .any(1) returns any true element over the axis 1 (rows) see documentation
  • df.loc[] then filters the dataframe based on the boolean df
Prageeth Jayathissa
  • 1,448
  • 8
  • 16