33

I came across a casual remark on The Chemical Statistician that a sample median could often be a choice for a sufficient statistic but, besides the obvious case of one or two observations where it equals the sample mean, I cannot think of another non-trivial and iid case where the sample median is sufficient.

Xi'an
  • 105,342
  • 1
    Did you mean to write "that a sample median could often be"? – Juho Kokkala Nov 06 '14 at 13:55
  • 10
    It's an interesting question; the double exponential has the median for a ML estimator of its location parameter, but it's not sufficient. – Glen_b Nov 06 '14 at 15:57
  • 1
    honestly, i strongly feel that something is missing in this Q&A, how is it possible that a ML estimator is not sufficient for itself? sorry for just throwing my doubts this way, i was never really interested in sufficient statistics. – carlo Aug 15 '20 at 07:28
  • @carlo: what do you mean by "sufficient for itself"? – Xi'an Aug 15 '20 at 19:47
  • @Xi’an, I deleted my comment, but I’ll see if I can turn the ideas into a proper answer. – Matt F. Jun 29 '21 at 10:33

2 Answers2

25

In the case when the support of the distribution does not depend on the unknown parameter $\theta, $ we can invoke the (Fréchet-Darmois-)Pitman-Koopman theorem, namely that the density of the observations is necessarily of the exponential family form, $$ \exp\{ \theta T(x) - \psi(\theta) \}h(x) $$ to conclude that, since the natural sufficient statistic $$ S=\sum_{i=1}^n T(x_i) $$ is also minimal sufficient, then the median should be a function of $S$, and the other way as well, which is impossible: modifying an extreme in the observations $x_1,\ldots,x_n$, $n>2$, modifies $S$ but does not modify the median. Therefore, the median cannot be sufficient when $n>2$.

In the alternative case when the support of the distribution does depend on the unknown parameter $θ$, I am less happy with the following proof: first, we can wlog consider the simple case when $$ f(x|\theta) = h(x) \mathbb{I}_{A_\theta}(x) \tau(\theta) $$ where the set $A_\theta$ indexed by $θ$ denotes the support of $f(\cdot|\theta)$. In that case, assuming the median is sufficient, the factorisation theorem implies that we have that $$ \prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i) $$ is a binary ($0-1$) function of the sample median $$ \prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i) = \mathbb{I}_{B^n_\theta}(\text{med}(x_{1:n})) $$ Indeed, there is no extra term in the factorisation since it should also be (i) a binary function of the data and (ii) independent from $\theta$. Adding a further observation $x_{n+1}$ which value is such that it does not modify the sample median then leads to a contradiction since it may be in or outside the support set, while $$ \mathbb{I}_{B^{n+1}_\theta}(\text{med}(x_{1:n+1}))=\mathbb{I}_{B^n_\theta}(\text{med}(x_{1:n}))\times \mathbb{I}_{A_\theta}(x_{n+1}). $$

User1865345
  • 8,202
Xi'an
  • 105,342
  • 1
    What is set $B_\theta^n$? – 3x89g2 May 03 '17 at 05:56
  • 4
    Isn't median sufficient for Laplace when scale is known? – Cagdas Ozgenc Jan 06 '20 at 13:15
  • 2
    No, it is not. Because Laplace is not within exponential families. – Xi'an Jan 06 '20 at 13:38
  • A question from a less technically literate user: what is the takeaway from each of the two cases and the combined takeaway? – Richard Hardy Aug 11 '20 at 09:34
  • 2
    You say and the case is solved. I am just wondering what the conclusion is. Is it that in this setting (support not depending on $\theta$), median is a sufficient statistic? Or is not a sufficient statistic? What about the second case? What do we learn about median as a sufficient statistic from the two cases taken together? I am not questioning the technical points you make, but I am trying to extract a takeaway message, an answer to the question you raise in the OP. You may think everything is obvious from what you have already written, but it might not be obvious for everyone (e.g. me). – Richard Hardy Aug 13 '20 at 07:05
  • Also, could you please add @RichardHardy in the comments to me? I did not get a notification of your last comment. I came back because I remembered I had posted a comment and I discovered there was an answer already. Thank you! – Richard Hardy Aug 13 '20 at 07:07
  • 3
    The answer to the question is that the median cannot be a sufficient statistic, except in trivial cases (like $n=1$). – Xi'an Aug 13 '20 at 07:08
  • 1
    @Xi'an, great! This is what I needed. Thank you very much! – Richard Hardy Aug 13 '20 at 07:08
  • 1
    @Xi'an "(median) is not sufficient for Laplace because it's not an exponential family" - are you saying only exponential families have sufficient statistics? Isn't the maximum a sufficient statistic for a uniform (0, $\theta$) distribution? – AdamO Feb 23 '23 at 22:16
  • @AdamO the pdf of the uniform distribution can be written with an indicator function, but one could also place this dependency into the support such that the pdf is plainly a $f(x) = 1/\theta$ and an exponential family. – Sextus Empiricus Feb 24 '23 at 00:17
  • But indeed one can have more than just the exponential family with sufficient statistics. In the case of the exponential family it satisfies the factorisation theorem by definition $$f(x) \propto h(x) \cdot exp(\eta(\theta) \cdot T(x))$$ this function $\eta(\theta) \cdot T(x)$ is specifically a linear function, but one could also have non-linear function like with the uniform distribution. – Sextus Empiricus Feb 24 '23 at 00:25
  • @AdamO Some families with parameter-dependent support (like uniforms) indeed enjoy fixed dimension sufficient statistics and are often called quasi-exponential for that reason. – Xi'an Feb 24 '23 at 07:07
  • 2
    @SextusEmpiricus To have the Uniform density written as an exponential family density $$f(x) \propto h(x) \cdot \exp(\eta(\theta) \cdot T(x))$$ is not feasible, meaning that most analytic properties associated with exponential families are not extendable to the Uniform distribution. – Xi'an Feb 24 '23 at 07:16
  • @Xi'an I agree. It can only be written as an exponential density like this $f(x) = exp(ln(-\theta))$, but then the dependency of the support on the distribution parameters is lost in the function. So that is a naive approach. Still, the second comment remains relevant, the exponential family is not the only type of distributions where there is a sufficient statistic. – Sextus Empiricus Feb 24 '23 at 10:41
4

Xi'an's answer raises the question of what is happening with the Laplace (double exponential distribution), where the MLE is the median. As Xi'an says, the median is not sufficient in this model; it can't be, because among families with constant support only exponential families have finite-dimensional sufficient statistics.

The log likelihood looks like this enter image description here

It's piecewise linear, increasing below the median and decreasing above the median. Note that it has a 'corner' at each observed $x$; you can see it doesn't factorise into a term that doesn't depend on parameter $\theta$ and a term that depends only on $\theta$ and the median.

Now, why is this possible? The MLE is necessarily a function of any sufficient statistic, but it's not necessarily true that there's a sufficient statistic that's a function of the MLE. This seems to imply there's information in the data that doesn't make it into the MLE. Whether that's true or not depends on how you define 'information in the data', but this paper shows that the median fails to attain the Cramer-Rao bound for the Laplacian model by a factor of about $1+8/n$.

What is true about the MLE is that it's asymptotically efficient. It's also asymptotically sufficient in some sense. For example, the Convolution theorem says that (under weak conditions) any estimator is equal to the asymptotically efficient estimator plus noise that doesn't depend on the parameter. Lin and Zeng's paper showing there's no asymptotic efficiency loss in meta-analysis using summary statistics is also relevant here

Thomas Lumley
  • 38,062
  • (+1) The M.L.E. isn't always asymptotically efficient, though there are some regularity conditions that will guarantee it is. Estimation of $\theta$ for i.i.d. r.v.s distributed uniformly between 0 & $\theta$ provides a good counter-example. – Scortchi - Reinstate Monica Feb 24 '23 at 19:04
  • @Scortchi-ReinstateMonica: in that Uniform case, it is super-efficient. – Xi'an Feb 25 '23 at 11:05