0

I'm doing a Simple Logistic Regression using R. The description of the variable is listed below:

   vars    n     mean      sd median trimmed     mad min    max  range skew kurtosis      se
X1    1 1382 14410.84 39410.4 2352.5 5507.66 3235.03   7 440000 439993 5.83     43.1 1060.12

enter image description here

The outcome:

   alive deceased 
    1341       66 

The data is highly right skewed. As far as I understand, logistic regression doesn't really care about normality. However, with highly skewed data like this, should I not transform the data? or is it not required?

And if I should transform, how do I know which transformation to use, and how to interpret it?

enter image description here

Maxim
  • 3,309

1 Answers1

1

Short of a graph of the variable I can only say that Logistic regression doesn't really care as long as the response is linear. You can normalize your data in order to lower the weights needed for the cost function and ease its job. Prior to that you can also log transform it a bit, that's just to bring extreme values more together, but as long as the response is linear it should be fine

R. Prost
  • 210
  • 1
    how do i know if the response is linear? I updated the question with the plot of the model. If I log-transform the data, how does it change the interpretation of the odds-ratio? – user6401955 Jan 16 '18 at 22:22
  • I would just use plot(variable) or the boxplot to see to what extend these are outlier or not. For the response you can look at the weight of evidence plot, or directly at the log-odds. You should be able to visually fit a straight line (well that's what the logistic regression does). The last plots you showed are not very meaningful for logistic regression – R. Prost Jan 16 '18 at 22:33
  • And there is a similar discussion there (https://stats.stackexchange.com/q/67078/191128) – R. Prost Jan 16 '18 at 22:37
  • I should say "helpful" rather than meaningful. – R. Prost Jan 24 '18 at 06:20