Pandas outlier removal on rows only in specific columns

Asked Mar 25 '22 at 06:42

Active Mar 27 '22 at 14:36

Viewed 112 times

I have a pandas dataframe which is the combination of 3 different lists of different length, each of them having 4 columns. I would like to remove the upper and lower outliers on the dataframe using quantile, however I would like not to remove an entire row when doing it.

What I would be wanting to do instead is:

If there is an outlier in the first 4 columns, then remove or replace the values in the corresponding row just for the first 4 columns, and then do the same for all the other columns always in blocks of 4.

I suppose the easiest way would be to just use three separate dataframes, then filter outliers on each of them separately and then join the dataframes to make a single one

EDIT: Doing it separately on three different datasets seems the best option

I tried as suggested in solution here but this throws NaNs on every column that isn't the outlier column, but it could already be a starting point...

q_low = df["col"].quantile(0.01)
q_hi  = df["col"].quantile(0.99)

df_filtered = df[(df["col"] < q_hi) & (df["col"] > q_low)]

edited Mar 27 '22 at 14:36

marc_s

704,970
168
1,303
1,425

asked Mar 25 '22 at 06:42

iLikeCentroids

Pandas outlier removal on rows only in specific columns

0 Answers0