11

Given a parametrical model $f_\theta$ and a random sample $X = (X_1, \cdots, X_n)$ from this model, a statistic $T(X)$ is sufficient if the distribution of $X$ given $T(X)$ doesn't depend on $\theta$.

Is there a standard way to compare two non-sufficient statistics to tell which is the closest to be sufficient? Some kind of measure of how far a statistic is from being sufficient?

Consider a simple normal location model : $f_\theta(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x - \theta)^2}$. Although neither the sample median nor the sample variance are sufficient, the sample median is more informative about $\theta$ than the sample variance. I would expect the median to close to be sufficient and the sample covariance to be as far as possible from being sufficient, since it is ancillary.

Pohoua
  • 2,548
  • Is this the kind of thing you mean? https://arxiv.org/pdf/1905.07822.pdf – steveLangsford Apr 04 '22 at 10:06
  • 1
    English usage only: sufficiency is the noun. – Nick Cox Apr 04 '22 at 10:13
  • 1
    @Xi'an Thanks for your answer, but I'm not sure to get what you mean by "information". Is there a standard definition for the information of a statistic ? Fisher information is relative to the model rather a statistic. I though, in a Bayesian setup, to compare the entropy of $f(\theta\mid X)$ with $f(\theta \mid T(X))$ which would be equal if $T(X)$ is sufficient but I expect that this would depend on the prior, plus sufficiency has a proper meaning in a frequentist context. – Pohoua Apr 04 '22 at 12:57
  • @steveLangsford Thanks for the reference. I'm looking it up. – Pohoua Apr 04 '22 at 12:59

1 Answers1

13

Fisher's information associated with a statistic $T$ is the Fisher information associated with the distribution of that statistic $$I_T(\theta) = \mathbb E_\theta\Big[\frac{\partial}{\partial \theta}\log f^T_\theta(T(X))^\prime \frac{\partial}{\partial \theta}\log f^T_\theta(T(X))\Big]$$ It is thus possible to compare Fisher's informations between statistics. For instance, Fisher's information associated with a sufficient statistic is the same as that of the entire sample X. On the other end of the spectrum, Fisher's information provided by an ancillary statistic is null.

Finding Fisher's information provided by the sample median is somewhat of a challenge. However, running a Monte Carlo experiment with $n$ large shows that the variance of the median is approximately 1.5-1.7 times larger than the variance of the empirical mean, which implies that the Fisher information is approximately 1,5-1.7 times smaller for the median. The exact expression of the (constant) Fisher information about $\theta$ attached with the median statistic $X_{(n/2)}$ of a $\mathcal N(\theta,1)$ sample is $$1 − \mathbb E_0\left[\frac{∂^2}{∂θ^2} \left\{ (n/2 − 1) \log \Phi (X_{(n/2)}) + (n − n/2) \log\Phi (-X_{(n/2)} )\right\}\right] $$ where expectation is under $\theta=0$. It also writes as $$1+n\mathbb E[Z_{n/2:n}\varphi(Z_{n/2:n})]-n\mathbb E[Z_{n/2:n-1}\varphi(Z_{n/2:n-1})]+\\ \frac{n(n-1)}{n/2-2}\varphi(Z_{n/2-2:n-2})^2+ \frac{n(n-1)}{n-n/2-1}\varphi(Z_{n/2:n-2})^2\tag{1}$$ (after correction of a typo in the thesis).

As stated in this same thesis

The median order statistics contain the most information about θ. (...) For n = 10, the X 5:10 and X 6:10 each contain 0.6622 times the total information in the sample. For n = 20, the proportion of information contained in the median statistic is 0.6498.

Since $1/0.6498= 1.5389$, it is already close to $1/4\varphi(0)^2=1.5707$. While a Monte Carlo approximation of (1) returns $1.5706$ for $n=10^4$.

Xi'an
  • 105,342
  • 1
    Thank you! That's exaclty what I was looking for. – Pohoua Apr 04 '22 at 15:15
  • 2
    (+1) Sometimes the CDF is not differentiable even if it is continuous. In some cases the weak derivative of the CDF can be used in place of the the PDF. So this already general approach of using the Fisher information can be extended in such adversarial cases. – Galen Apr 04 '22 at 17:18
  • 1
    Purportedly the sample median has a sample variance of $\frac{1}{4n f(m)^2}$ where $n$ is the sample size, $f$ is the density of the original random variable $X$, and $m$ is the median of $f$. – Galen Apr 04 '22 at 17:46
  • @Xi'an I was puzzled by the fact that the variance of the median is the inverse of it's Fisher information (it's obviously not the case for the sample variance), although it's definitely the case (checked on R). I made sense of that by noticing that in a location model, the median is the argmax in $\theta$ of $f_{med(X)\mid\theta}(med(X))$ (median's cdf is $F_\theta (m) = C_n^{n/2}[F_X(m)(1 - F_X(m))]^{m/2}$ where $F_X$ is the cdf of the data). Is this a characterization of statistics for which the inverse Fisher information equals the asymptotic variance ? – Pohoua Apr 05 '22 at 08:46
  • That the variance is the inverse information is true for the sample variance: https://stats.stackexchange.com/q/316327/7224 – Xi'an Apr 05 '22 at 09:03