0

I am trying to model sales as a function of various variables (debt, number of employees, competitors etc.). For this I have transformed both dependent and independent variables using natural logarithm.

The problem is the residuals are not normal as indicated by both their plot and the Shapiro-Wilk test.

I imagine that the log transform can also affect the residuals: could this explain their lack of normality?

Other model stats are looking good, R2 adjusted = 0.92, F test is significant, Resid Std Err = 0.5, and the mean of residuals is 0.

Edit:

Size of dataset: N = 4403; 8 variables in the model: 3 continuous, 5 discrete

enter image description here

cremorna
  • 103
  • How big is your data set? Can you show us a Q-Q plot? Rejecting normality is (1) often unimportant and (2) almost inevitable with a big data set. – Ben Bolker Mar 01 '22 at 21:38
  • Edited the original post. Thank you for your comment! – cremorna Mar 01 '22 at 21:52
  • 2
  • don't try to interpret a QQ plot without examining the "prior" plots for the fit of the mean and heteroskedasticity (in R residuals vs fitted and scale-location at a minimum), The QQ plot is only interpretable if the fit and conditional variance assumptions are reasonable. 2. If all that's okay and there's no omitted but potentially important covariates/predictors, you might find a log-link gamma GLM (with logged x) is a better fit for the conditional distribution.
  • – Glen_b Mar 01 '22 at 23:28