when should I normalize with $\log(1+x)$ instead of with $\log$?

Question

I've seen people log-normalize data by using the $\log(1+x)$ (np.log1p) method

for instance normalizing the price of diamonds in the diamonds dataset using log1p

if the loss function is RMSE, than normalizing with $\log$ is akin to using a RMSLE errors. is there a similar insight when normalizing with $\log(1+x)$?
when should I use $\log(1+x)$ rather than $\log(x)$?
- what if I am guaranteed not to have 0-values or values very close to 0 in my outcome variable?

If the variable you want to take log of is your outcome variable in a regression, I suggest using the Poisson regression instead such is fine with the zeros. Taking $\ln(1+x)$ or ${\rm asinh}(x)$ are workarounds without thinking very seriously about the data-generating process. https://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ — Student, Nov 08 '19 at 00:48
@student that article is too technical for me. its unclear what are the benefits in my case (where there are no 0 in the outcome variable).. when you say ln(1+x) are workarounds ... workarounds to what? — Aviad Rozenhek, Nov 09 '19 at 19:01
I deleted my comments because I think they were just more confusing. Long story short, if you don’t have zeros, you can take log directly. $\ln(1+x)$ has no theoretical justification that should make you prefer it. — Student, Nov 09 '19 at 19:33
I second "Student"'s comment. This is often a legacy approach from the times before GLM (1980s and before). Now we can choose among diverse distributions for y. And if you want to log-transform and have no 0s, dump the +1. — Carsten, Nov 09 '19 at 20:29

score 1 · Answer 1 · answered Sep 27 '22 at 09:05

As Student mentions in the comments, use $\log(x)$ if you don't have zeroes. If you do have zeroes, though, $\log(1+x)$ is NOT a good solution, as explained well at this thread: Interpreting log-log regression with log(1+x) as independent variable. A better solution if there are zeroes is to use a square root transformation, and if there are negative values to use a cube root transformation.

1 Answers1