I've seen people log-normalize data by using the $\log(1+x)$ (np.log1p) method
for instance normalizing the price of diamonds in the diamonds dataset using
log1p
- if the loss function is RMSE, than normalizing with $\log$ is akin to using a RMSLE errors. is there a similar insight when normalizing with $\log(1+x)$?
- when should I use $\log(1+x)$ rather than $\log(x)$?
- what if I am guaranteed not to have 0-values or values very close to 0 in my outcome variable?