Consider a random set of numbers that are normally distributed:
x <- rnorm(n=1000, mean=10)
We'd like to know the mean and the standard error on the mean so we do the following:
se <- function(x) { sd(x)/sqrt(length(x)) }
mean(x) # something near 10.0 units
se(x) # something near 0.03 units
Great!
However, let's assume we don't necessarily know that our original distribution follows a normal distribution. We log-transform the data and perform the same standard error calculation.
z <- log(x, base=10)
mean(z) # something near 1 log units
se(z) # something near 0.001 log units
Cool, but now we need to back-transform to get our answer in units NOT log units.
10^mean(z) # something near 10.0 units
10^se(z) # something near 1.00 units
My question: Why, for a normal distribution, does the standard error differ depending on whether it was calculated from the distribution itself or if it was transformed, calculated, and back-transformed? Note: the means came out the same regardless of the transformation.
EDIT #1: Ultimately, I am interested in calculating a mean and confidence intervals for non-normally distributed data, so if you can give some guidance on how to calculate 95% CI's on transformed data including how to back-transform to their native units, I would appreciate it!
END EDIT #1
EDIT #2: I tried using the quantile function to get the 95% confidence intervals:
quantile(x, probs = c(0.05, 0.95)) # around [8.3, 11.6]
10^quantile(z, probs = c(0.05, 0.95)) # around [8.3, 11.6]
So, that converged on the same answer, which is good. However, using this method doesn't provide the exact same interval using non-normal data with "small" sample sizes:
t <- rlnorm(10)
mean(t) # around 1.46 units
10^mean(log(t, base=10)) # around 0.92 units
quantile(t, probs = c(0.05, 0.95)) # around [0.211, 4.79]
10^(quantile(log(t, base=10), probs = c(0.05, 0.95))) # around [0.209, 4.28]
Which method would be considered "more correct". I assume one would pick the most conservative estimate?
As an example, would you report this result for the non-normal data (t) as having a mean of 0.92 units with a 95% confidence interval of [0.211, 4.79]?
END EDIT #2
Thanks for your time!