Follow the very useful answers from Peter Flom, Wayne and many others. I have now started using R and it gives me a feeling of python :)
The results are below but I am not sure how should I go from here ? The density certain looks much better after log transformation. Can you please shed some light on how to do further analysis ?
Thanks a lot.
R - Results below:
plot (density (messages$length))

plot (density (log (messages$length)))
summary (messages)
> summary(message$mb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00665 0.32610 0.88450 2.08500 2.35000 49.13000
qqnorm (messages$length)

=====================================================================
EDIT: Thanks all all the answering !
I have tried the qqnorm with log(x) and it looks like a straight line ! Does this mean my data is pretty much following a Log-normal distribution ?
qqnorm (log(messages$length)):

Also I have tried to fit my data with a log-normal and below is the result.
fitdistr(message$mb, densfun="log-normal") meanlog sdlog
-0.19019347 1.45795269 ( 0.02003787) ( 0.01416891)
Does this mean anything ?
fitdistrfunction from theMASSpackage. On your untransformed data, you could try to fit a log-normal distribution:fitdistr(messages$length, densfun="log-normal"). This post might provide some further imputs. – COOLSerdash May 23 '13 at 15:56qqnorm()too. If there is systematic curvature, lognormal is not quite right, although that doesn't mean that there is a much better candidate. Gamma might be another one to try. My wild guess from the density plot is that lognormal will work better than the gamma. – Nick Cox May 23 '13 at 16:55qqnorm()applied to that should show a straight line. – Nick Cox May 23 '13 at 22:28qqPlotfunction from thecarpackage. Then you could either putqqPlot(messages$length, distribution="lnorm")orqqPlot(log(messages$length), distribution="norm")to fit QQ-plot on the original scale or on the log-scale. The output fromfitdistrare the mean and sd of your distribution on the log scale. – COOLSerdash May 24 '13 at 11:19install.packages("car"))? That works for me. If you assume that your data follow a log-normal distribution with a mean of -0.19 and a sd of 1.458 on the log scale, you can use the CDF of the normal distribution to calculate the probability that a message exceeds 45M:1-pnorm(log(45), mean=-0.19019347, sd=1.45795269)This gives a probability of 0.0031. – COOLSerdash May 24 '13 at 12:22log(messages$length). The mean of your data on the original scale would be: $\exp(\mu + \sigma^2/2)$, so around 2.39 (with $\mu=-0.19$ and $\sigma^{2}=1.458^{2}=2.126$. The variance would be $[\exp(\sigma^{2}) - 1]\cdot \exp(2\mu + \sigma^{2})=42.257$. – COOLSerdash May 24 '13 at 12:52