I have a dataset, X, of real numbers, x, that I assume they follow a Lognormal distribution.
Based on this, the distribution Y(y), y=LN(x), is Gaussian.
Compute the parameters of X and Y, the mean value, standard deviation value and ultimately the coefficient of variation value.
Questions:
Given
X ~ Lognormal (μX , σX)
Y ~ Normal (μY , σY)
> For Y: an estimate of the mean value is μY an estimate of the standard
> deviation value is σY an estimate of the CoV value is σY/μY
>
> For X: an estimate of the mean value is exp(μX + σX^2/2) an estimate
> of the standard deviation value is SQRT[exp(σX^2) − 1]*exp(2μX +
> σX^2)] an estimate of the CoV value is SQRT[exp(σX^2) - 1]
1- Should μX = μY and σX = σY? Or are (μ , σ) calculated for each Dataset X and Y?
2- What are the expressions that relate the parameters, mean, standard deviation and CoV of X and Y?
Example:
import numpy as np
import pandas as pd
np.random.seed(0)
LNd = pd.DataFrame(np.random.lognormal(mean=0.0, sigma=1.0, size=1000000), columns=['Values'])
mX = np.mean(LNd.values)
sX = np.std(LNd.values)
print(mX, sX)
Nd = LNd['Values'].apply(lambda x: np.log(x))
Nd.columns = ['Values']
mY = np.mean(Nd.values)
sY = np.std(Nd.values)
print(mY, sY)
emX=np.exp(mY+(sY**2)/2)
esX=(np.exp(2*mY+2*(sY**2))-np.exp(2*mY+(sY**2)))**0.5
print(emX, esX)
emY=np.log(mX)-(sX**2)/2
esY=(np.log(1+(sX/mX)**2))**0.5
print(emY, esY)
Why is emy not equal to mY?
In other references (wikipedia) there are other expressions but still emy is not equal to mY.
emyshould bemY? – honeybadger Jul 17 '18 at 16:16μY=f(μX,σX). What is the function f? – jpcgandre Jul 18 '18 at 15:23