I have a pandas dataframe which is the combination of 3 different lists of different length, each of them having 4 columns. I would like to remove the upper and lower outliers on the dataframe using quantile,
however I would like not to remove an entire row when doing it.
What I would be wanting to do instead is:
If there is an outlier in the first 4 columns, then remove or replace the values in the corresponding row just for the first 4 columns, and then do the same for all the other columns always in blocks of 4.
I suppose the easiest way would be to just use three separate dataframes, then filter outliers on each of them separately and then join the dataframes to make a single one
EDIT: Doing it separately on three different datasets seems the best option
I tried as suggested in solution here but this throws NaNs on every column that isn't the outlier column, but it could already be a starting point...
q_low = df["col"].quantile(0.01)
q_hi = df["col"].quantile(0.99)
df_filtered = df[(df["col"] < q_hi) & (df["col"] > q_low)]