3

I want to calculate similarity between various samples, but I'm limited to only knowing the IQR and medians. I came across a similar problem here but the top answer states that the newly proposed statistic follows the same distribution, but doesn't explain why it does:

$$t = \frac{\text{Mean}_1 - \text{Mean}_2}{\sqrt{\dfrac{s_1^2}{N_1}+\dfrac{s_2^2}{N_2}}}$$

$$u = \frac{\text{Median}_1 - \text{Median}_2}{\sqrt{\pi/2}\,\sqrt{J_1+J_2}}$$ where $$J_1=\frac{IQR_1^2}{1.82 N_1}, \ \ J_2 = \frac{IQR_2^2}{1.82 N_2}$$

I understand that essentially, we're comparing average values against variability. Specifically, I don't understand the following parts which should make these distributions similar:

  1. Why do we divide the difference in medians by the square root of $\pi/2$ ?
  2. Why do we divide the IQR by 1.82?
Tadsz
  • 31
  • Do you mean have only the medians and IRQ, not the sample size also ? – Sal Mangiafico May 30 '23 at 14:06
  • The distributional assertion in the referenced answer is incorrect. It should be interpreted as having approximately the same distribution for large sample sizes. – whuber May 30 '23 at 15:48

1 Answers1

6

The answer that you cited says: This is because the variances of sample medians are $\pi/2$ times the variances of sample means, and the IQRs are 1.82 times the standard deviations.

Here are more details:

$t$ distributed variables with $d$ degrees of freedom are defined as the ratio of a standard normal (N(0,1)) distributed variable $X$ and an independent variable $S$, for which $d\cdot S^2$ has $\chi^2$ distribution with $d$ degrees of freedom.

If sample 1 and sample 2 both come from an $N(\mu,\sigma^2)$ distribution, then the numerator $(\mathrm{Mean}_1 - \mathrm{Mean}_2)$ has a normal distribution with mean 0 and variance $\tau^2 = \sigma^2(1/N_1 +1/N_2)$. If you divide $(\mathrm{Mean}_1 - \mathrm{Mean}_2)$ by $\tau = \sqrt{\sigma^2(1/N_1 +1/N_2)}$, then you get an $N(0,1)$ distributed variable.

If you divide the denominator, $\sqrt{S_1^2/N_1 + S_2^2/N_2}$, by $\tau$, then (denominator)$^2\cdot d$ has a $\chi^2$ distribution with $d=N_1+N_2-2$ degrees of freedom. Therefore, $$ T = \frac{\mathrm{Mean}_1 - \mathrm{Mean}_2}{\sqrt{S_1^2/N_1 + S_2^2/N_2}} = \frac{(\mathrm{Mean}_1 - \mathrm{Mean}_2)/\tau}{\sqrt{S_1^2/N_1 + S_2^2/N_2}/\tau} $$ follows a $t$-distribution with $d=N_1+N_2-2$ degrees of freedom.

Now, if you replace sample means by sample medians, the numerator of your $t$-variable spreads more. It is approximately $N(0, \tau^2\cdot\pi/2)$ distributed, but by dividing with $\sqrt{\pi/2}$, you are approximately back at the initial $N(0,\tau^2)$ distribution for the original numerator, $\mathrm{Mean}_1 - \mathrm{Mean}_2$.

Equally, the mean IQR in a standard normal is 1.349. IQR scales with the standard deviation for general normal distributions, $N(\mu,\sigma^2)$. Thus, $\mathrm{IQR}/1.349$ is an estimator for $\sigma$ and $\mathrm{IQR}^2/1.349^2=\mathrm{IQR}^2/1.82$ is an estimator for $\sigma^2$. Replacing $S^2$ with $\mathrm{IQR}^2/1.82$ gets you approximately back to the initial denominator.

Ute
  • 2,580
  • 1
  • 8
  • 22