I am reading Huber's Robust Statistics (2nd). On page 2 and 3 he gave an example. The basic facts are summarized here. Let $(X_n)$ be a sequence of random variables and define two measures of spread as follows.
- Mean Absolute Deviation: $d_n := \frac{1}{n}\sum|x_i-\bar x|$.
- Standard Deviation: $s_n := \sqrt{\frac{1}{n}\sum (x_i-\bar x)^2}$.
Then he mentioned that Fisher claimed that for identically distributed normal observations $s_n$ is about 12% more efficient than $d_n$. In addition, $s_n$ converges to $\sigma$ while $d_n$ converges to $\sigma\sqrt{2/\pi}\doteq 0.8\sigma$. I have several questions about these statements.
- How to prove that $s_n$ is 12% more efficient, please? As least where to find the proof, please?
- How to prove that $d_n$ converges to $\sigma\sqrt{2/\pi}\doteq 0.8\sigma$, please? Again at least where to find the proof, please?
- I did some simulation to test all the above statements. Here are the codes and outcome.
n <- 10000 # number of samples x <- array(list(), n)set.seed(2014)
for(i in 1:n){ x[[i]] <- rnorm(10000) # the 10000 here is the size of each sample }
dn <- rep(0, n) # mad sn <- rep(0, n) # sd
for(i in 1:1000){ dn[i] <- mean(abs(x[[i]]-mean(x[[i]]))) # mad sn[i] <- sqrt(var(x[[i]])*999/1000) # sd }
mean(dn) # 0.07979068 check out mean(sn) # 0.09995901 check out
var(dn)/var(sn) # 0.6371817
As the above simulation shows, the 12% efficiency of $s_n$ does not check out. Why is this the case, please? Did I make errors in my simulation, please? Thank you!
for(i in 1:1000)but it should befor(i in 1:n)when you are calculating $d_n$ and $s_n$!!! Note thatmean(dn)andmean(sn)are off by a factor of 10! – guy Aug 16 '14 at 18:06var(dn)should be replaced bypi / 2 * var(dn)since $\sqrt{\pi / 2} d_n$ is your consistent estimator of $\sigma$. – guy Aug 16 '14 at 18:15