1

I want to transform some data points, which I assume follow an unknown non-standard lognormal distribution, to follow a normal distribution.

When I fit the data with a lognormal distribution using scipy, I get a non-zero value for the loc parameter (shift).

So I use the following procedure:

  1. I fit a lognormal distribution to my data to find the value of the shift parameter (loc)
  2. I shift the data: data - loc
  3. I compute the natural logarithm of the data

In code the procedure is the following:

data = ... # pandas series
shape, loc, scale = lognorm.fit(data) # fit the data
data = data.apply(lambda x: np.log(x - loc)) # shift and apply logarithm

In some answers, such as this,they just say to take the logarithm of the data, and do not address a possible shift.

In my case this would be possible because the data is shifted to the right, so values are all positive, but would I obtain normally distributed data? I think that, depending from the shift, the transformed data would deviate from normality. Am I wrong?

To summarize these are my questions:

  • is my procedure correct?
  • If I don't shift the data, to save on the computational complexity of fitting the distribution (in my application this is important), do I introduce an error (deviation from normality)? If yes, can I quantify it in some way?
ptrchv
  • 13
  • This seems mainly a question of terminology. What most people mean by "lognormal" is a random variable whose logarithm has a Normal distribution: that corresponds to the formula in the scipy docs. Sometimes a generalized or three-parameter lognormal is meant in which a lognormal random variable has been (additively) translated, as here. That corresponds to the optional loc argument in the scipy implementation which is not reflected in the formula there. As a rule you should not assume this is what someone means when referring to "lognormal" unless they have stated otherwise. – whuber Feb 09 '23 at 16:00

1 Answers1

0

The definition of a log-normal distribution is that taking the log gives a normal distribution.

$$ X\sim\text{log-normal}$$$$ \Bigg\Updownarrow$$$$ \log\left( X \right)\sim\text{Normal} $$

If you believe the first distribution to be log-normal, just take the logarithm to get a normal distribution.

If there is a shift $k$ from the standard location of a log-normal distribution, then subtracting $k$ would happen before the logarithm, with the resulting distribution being normal. If you do not know this shift, however, then its estimation is subject to error, and $\log\left(X-\hat k\right)$ might not be quite the right transformation to give a normal distribution. For instance, if you have $k=2$ but $\hat k = 2.1$, then you subtract the wrong location shift. This will be particularly problematic if $\hat k\ge X_{(1)}$ (that is, if the estimated location shift is greater than or equal to the smallest observed value (first order statistics)), as that will result in taking the logarithm of a value $\le 0$.

Dave
  • 62,186
  • 1
    Ok, but according to the fit, my random variable is $X$ ~ lognormal + K, so I'm not sure that $log(X)$ ~ Normal holds. – ptrchv Feb 09 '23 at 13:11