5

I've seen people log-normalize data by using the $\log(1+x)$ (np.log1p) method

for instance normalizing the price of diamonds in the diamonds dataset using log1p

  1. if the loss function is RMSE, than normalizing with $\log$ is akin to using a RMSLE errors. is there a similar insight when normalizing with $\log(1+x)$?
  2. when should I use $\log(1+x)$ rather than $\log(x)$?
    • what if I am guaranteed not to have 0-values or values very close to 0 in my outcome variable?
  • 1
    If the variable you want to take log of is your outcome variable in a regression, I suggest using the Poisson regression instead such is fine with the zeros. Taking $\ln(1+x)$ or ${\rm asinh}(x)$ are workarounds without thinking very seriously about the data-generating process. https://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ – Student Nov 08 '19 at 00:48
  • @student that article is too technical for me. its unclear what are the benefits in my case (where there are no 0 in the outcome variable).. when you say ln(1+x) are workarounds ... workarounds to what? – Aviad Rozenhek Nov 09 '19 at 19:01
  • 2
    I deleted my comments because I think they were just more confusing. Long story short, if you don’t have zeros, you can take log directly. $\ln(1+x)$ has no theoretical justification that should make you prefer it. – Student Nov 09 '19 at 19:33
  • 1
    I second "Student"'s comment. This is often a legacy approach from the times before GLM (1980s and before). Now we can choose among diverse distributions for y. And if you want to log-transform and have no 0s, dump the +1. – Carsten Nov 09 '19 at 20:29

1 Answers1

1

As Student mentions in the comments, use $\log(x)$ if you don't have zeroes. If you do have zeroes, though, $\log(1+x)$ is NOT a good solution, as explained well at this thread: Interpreting log-log regression with log(1+x) as independent variable. A better solution if there are zeroes is to use a square root transformation, and if there are negative values to use a cube root transformation.

mkt
  • 18,245
  • 11
  • 73
  • 172