6

I was wondering about the asymptotic efficiency of the Interquartile Range (IQR) in the Gaussian case. I have calculated it empirically using a Monte Carlo estimator, and it appears to be equal to that of the Median Absolute Deviation (MAD) (36.75%).

However, I have not found any literature on this topic. Specifically, I have not found a formula for the covariance between two order statistics to calculate its variance. Indeed, for a vector $X$ of size $N=4n+1$ with $n\in \mathbb{N}^*$, we have : $$\widehat{IQR}(X)=Q(X,.75)-Q(X,.25)=X_{(3n+1)}-X_{(n+1)}$$.

So $\text{Var}(\widehat{IQR}(X)) = \text{Var}(X_{(3n+1)}+\text{Var}(X_{(n+1)})-2\text{Cov}(X_{(3n+1)},X_{(n+1)})$.

I've found an approximation of the variance of order statistic here but nothing on the covariance.

If you have any references on efficiency of IQR or on covariance of order statistics of a Gaussian distribution feel free to help me!

Richard Hardy
  • 67,272
zantoox
  • 61
  • 1
    Asymptotically, the median converges to the mean, and for a symmetric distribution the median of the deviations from the median will be obtained at the 0.25 and 0.75-quantile (which will asymptotically have an equal deviation from the median). That's why the asymptotic efficiency of MAD and IQR are the same. – Christian Hennig Jun 22 '23 at 23:52
  • Thank you Christian !! – zantoox Jun 23 '23 at 16:49

1 Answers1

10

Asymptotic distribution of the interquartile range

The asymptotic distribution of the interquartile range for the normal distribution is shown here. Let $f$ be the density, $F$ the CDF and the population quantile function be $F^{-1}(p)$ of a random variable. Further, let $F^{-1}(p) = \xi_{p}$. Then, the following holds asymptotically: $$ \sqrt{n}\left(\mathrm{IQR} - \left(\xi_{\frac{3}{4}}-\xi_{\frac{1}{4}}\right)\right)\xrightarrow{d} \mathrm{N}\left(0, \frac{1}{16}\left[\frac{3}{f^{2}(\xi_{\frac{3}{4}})}+\frac{3}{f^{2}(\xi_{\frac{1}{4}})}-\frac{2}{f(\xi_{\frac{1}{4}})f(\xi_{\frac{3}{4}})}\right]\right) $$

For iid observations of a normal distribution $\mathrm{N}(\mu, \sigma^{2})$, this result simplifies to: $$ \sqrt{n}\left(\mathrm{IQR} - 1.349\sigma\right)\xrightarrow{d} \mathrm{N}\left(0, 2.476\sigma^{2}\right) $$ So asymptotically, the standard deviation is $1.573\sqrt{\frac{\sigma^{2}}{n}}$.

In summary, the IQR of a normal distribution $\mathrm{N}(\mu, \sigma^{2})$ is asymptotically normally distributed with mean $F^{-1}_{\mu, \sigma^2}(3/4)-F^{-1}_{\mu, \sigma^2}(1/4)$ (i.e. the population IQR) and variance $2.47569\sigma^2/n$.

Let's check that with a small simulation:

# Parameters
mu <- 100
sigma <- 15
n <- 5000

Asymptotic variance of IQR

varfac <- (1/2)exp(2(qnorm(1/2/2, lower.tail = FALSE)/sqrt(2))^2)*pi

Population IQR

true_iqr <- qnorm(3/4, mu, sigma) - qnorm(1/4, mu, sigma)

Simulation

set.seed(142857) res <- replicate(1e5, { IQR(rnorm(n, mu, sigma)) })

Mean and variance of simulated IQRs

mean(res) [1] 20.2288 var(res) [1] 0.1117288 varfac*sigma^2/n [1] 0.111406

For $n=5000$, the agreement is excellent.

Relative efficiency

For the standard deviation, we have according to the delta method $$ \sqrt{n}(s_n-\sigma)\xrightarrow{d} \operatorname{N}\left(0, \frac{\mu_4-\sigma^4}{4\sigma^2}\right) $$ where $\mu_4$ is the 4th central moment. For a normal distribution, this simplifies to $\operatorname{N}\left(0, 1/2\sigma^2\right)$. A consistent estimate of $\sigma$ of a normal distribution is $\operatorname{IQR}/1.349$ which has an asymptotic variance of $2.476/1.349^2=1.361$. Hence, the asymptotic efficiency of the interquartile range relative to the standard deviation is the ratio of their asymptotic variances, namely $(1/2)/1.361=0.367$.

COOLSerdash
  • 30,198
  • 4
    Nice work. My conclusion is that like the median and quartiles, the IQR is not precise enough for small to moderate $n$. Check out Gini's mean difference. – Frank Harrell Jun 22 '23 at 19:01
  • 4
    @FrankHarrell Thanks Frank. I'm aware of Gini's mean difference. Its ARE is 97.79% at the normal compared to the standard deviation and much more efficient in heavy tailed distributions. One main advantage is that as a U-statistic, GMD is unbiased in finite samples for all distribution with finite first moments (Gerstenberger & Vogel 2015). I really should be reported more often. – COOLSerdash Jun 22 '23 at 19:13
  • Thanks a lot. Really nice work !! – zantoox Jun 23 '23 at 16:48