0

I am using both IQR and Z score > +/- 3SD for outlier detection.

It seems like Z score > +/- 3SD is more strict and yields fewer outliers than IQR, which is better for my purposes (Regression, Airbnb Price Prediction)

However, there are still lots of outliers being detected with the Z score method. Is there a more sophisticated/systematic way to go about how to remove these outliers? Some have suggested just eyeballing with boxplot/histogram/scatterplot, but I am unsatisfied by the nonquantitative nature of it. Hoping there is a more systematic approach that I can follow and use as a first "screening" step in most problems similar to this.

Below is my dataframe of how many outliers are found in each feature - I can obviously remove them all but that is also unsatisfying...

{'Column': {0: 'price',
  1: 'minimum_nights',
  2: 'date_of_review_year',
  3: 'last_review_year',
  4: 'number_of_reviews',
  5: 'reviews_per_month',
  6: 'number_of_reviews_ltm',
  7: 'listing_id',
  8: 'id',
  9: 'calculated_host_listings_count',
  10: 'host_id'},
 'Outlier Count': {0: 575,
  1: 1053,
  2: 1107,
  3: 1515,
  4: 2236,
  5: 4075,
  6: 4799,
  7: 8052,
  8: 8573,
  9: 8925,
  10: 10844},
 'Percentage': {0: 0.2152816258068381,
  1: 0.39424617734713135,
  2: 0.41446393003159965,
  3: 0.5672202836475821,
  4: 0.8371647222679826,
  5: 1.5256915220223743,
  6: 1.7967591691252454,
  7: 3.0146915669507135,
  8: 3.2097554400730837,
  9: 3.3415452353496176,
  10: 4.060024261303222},
 'Max Allowable Value': {0: 1903.2474464485751,
  1: 81.1471012097081,
  2: 2028.4407611958543,
  3: 2029.3504386277002,
  4: 793.1969405905453,
  5: 13.758606675183554,
  6: 193.4008759750987,
  7: 3.518988602860466e+17,
  8: 3.6651180886284006e+17,
  9: 35.72229351455516,
  10: 382071191.0582348},
 'Min Allowable Value': {0: -1517.6718245654786,
  1: -66.25626209809113,
  2: 2009.3519993510733,
  3: 2015.490896942066,
  4: -326.6803844975137,
  5: -6.8684181831946525,
  6: -113.21529946962494,
  7: -3.129647879837069e+17,
  8: -3.248157618682771e+17,
  9: -26.539592422796517,
  10: -248407483.67581975}}
Katsu
  • 911

0 Answers0