Let $X = [94, 10, 100, 100, 16, 14, 100, 100, 70, 88, 100, 100, 12, 100, 100, 58, 32, 100, 32, 36, 98, 0, 100, 100, 100]$
where $X$ are students' scores (between 0 and 100), and note many full marks!
The Question is what statistics will best describe the data (note data is non-Gaussian)
Option 1
If I fit a Gaussian using maximum likelihood I will get
sample mean = 70.4, and SD = 37.96, so a mean +/- 1 SD gives an interval from 32.43 to 108.36.
Finally, If I fit a Gaussian to the data $X$ using normfit in matlab(R) and obtain a 95% confidence bound on the mean and standard deviation I will get
$$
\begin{aligned}
\mu &= 70.4 ; &CI_{95\%} = [54.73, 86.06] \\
\sigma &= 37.96 ; &CI_{95\%} = [29.64, 52.80]
\end{aligned}
$$
Option 2
On the other hand, what if I use left / right SD instead? I.e., to report two SD values, SD_left and SD_right where: $$ \begin{aligned} SD_{left} &= \sqrt{\frac{1}{N_{left}} * \sum(X*I(X<\mu) - \mu)^2} &= 49.94 \\ SD_{right} &= \sqrt{\frac{1}{N_{right}} * \sum(X*I(X\ge\mu) - \mu)^2} &= 29.45 \end{aligned} $$
where $\mu=70.4$ is the mean, $N_{left} = \sum(I(X<\mu)) - 1 = 9$ is number of samples less than the mean (minus 1 to remove bias) and $I$ is the indicator function which gives 1 if its argument is true or else 0; $N_{right} = \sum(I(X\ge\mu)) - 1 = 14$
In this case the interval around the mean is [20.46, 99.85], instead of the previous result, [32.43, 108.36].
Which one shall I go for, 1 or 2?
for Left SD, we take only the samples in X that are less than the mean (70.4) and use these in the std equation while adjusting for the new N to become N_Left
My question is more about what std value is more informative? the maximum likelihood estimate (which assumes a well behaved Gaussian) or the left/right estimate (which is still using the maximum likelihood mean of a Gaussian)
– deepML Aug 01 '12 at 16:25"In this paper ...the use of the left and right variance is proposed and an index of asymmetry based on them is introduced. Several examples demonstrate its usefulness. The question of evaluating more accurately the dispersion of data about .."
http://faculty.kfupm.edu.sa/math/anwarj/Research/39IJMEST2004r.pdf
– deepML Aug 01 '12 at 17:12