0

I created a function to drop my outliers. Here is the function

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    train = train.drop(drop_index,axis = 0)

and when I do

dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Nothing is being dropped.However when I manually execute the function. i.e get the index in the dataframe for this condition, I do get a valid index (943) and when I do

train = train.drop([943],axis = 0)

Then the row I want is being dropped correctly. I don't understand why the function wouldn't work as its supposed to be doing exactly what I am doing manually.

1 Answers1

1

At the end of dropping_outliers, it's assigning the result of drop to a local variable, not altering the dataframe passed in. Try this instead:

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    return train.drop(drop_index,axis = 0)

Then do the assignment when you call the function.

train = dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Also see python pandas dataframe, is it pass-by-value or pass-by-reference.

Bill the Lizard
  • 386,424
  • 207
  • 554
  • 861