0

I am running a regression model, and I need to delete outliers. However, when I ran a boxplot, it asked me to delete at least 100 datas (I only have 700 datas in total). Both y and x variables are right-skewed distribution.

boxplot: (y is total_cast)

boxplot(data)

enter image description here

I did not delete anything and tried to run the model first. Only 2 variables' p-values are strong significant with three stars, another 3 variables have weak significant with only one star. R-squared is only 0.5, which is not good enough.

so I checked the plot:

plot(cost_model, which = 2)

enter image description here

Does this mean I only have to delete 3 outliers in this data? From row 45, row 107 and row 147? I do not understand what these number means and how to delete them.

Ching
  • 33
  • 1
    Why do you need to delete the outliers? // Why is $R^2\approx 0$ not good enough, because a $50%$ is an $\text{F}$ grade in school? – Dave Dec 14 '20 at 12:03
  • @Dave Hi! The professor wants to know how we deal with outliers, and since the R-squared is not high (close to 1), I want to delete the outliers to see if the model can be more ideal. – Ching Dec 14 '20 at 12:21
  • Why do you think your $R^2$ value is low? // It sounds like you have just one predictor (independent) variable. Please post a scatter plot of your data (assuming nothing is proprietary or otherwise confidential). – Dave Dec 14 '20 at 12:22
  • @Dave from the data I analyzed before, normally the models have at least 0.7 in R-squared. That's why I think 0.5 is a bit low. – Ching Dec 14 '20 at 12:25
  • 3
    "I need to delete outliers" No, you don't. If Total_cost is your dependent variable, you most likely should transform it or use a GLM. – Roland Dec 14 '20 at 15:41

0 Answers0