6

Consider a simple example of $X_{i}$ be i.i.d uniform distribution on the interval $[\theta,\theta+1]$. By strong low of large numbers, I may conclude that $$\overline{X}\rightarrow_{P} \theta+\frac{1}{2} $$ However, it is not so clear what is the pdf or cdf of $\overline{X}$ as I would have to do an $n$-dimensional integral to get the cdf as $F_{\overline{X}}(t)=P(\sum X_{i}<nt,\theta\le X_{i}<\theta+1)$. Nevertheless it seems "intuitive" that $\overline{X}$ should be a good statistic. But practice one soon learns that the real sufficient statistic is $(X_{(1)},X_{(n)})$, and proving this via the factorization theorem is not difficult.

I decided to ask the professor about this after the class. He told me that $\overline{X}$ is not a good statistic because it is not close enough to $\theta$, and as statisticans one has to consider real life applications. But this explanation is not persuasive, since I can use $\overline{X}-\frac{1}{2}$ for the same purpose. I want to ask if $\overline{X}$ is really a bad statistic for this example, and if yes for what reason. Also I want to know if $\overline{X}$ is a sufficient statistic.

  • 1
    "the expectation" is not a statistic at all: $\bar X$ is a sample mean, not an expectation. Do you know the definition of a sufficient statistic, and any ways to check for sufficiency? You may find this discussion of the more general case of some help. (The short answer is '$\bar X$ is not sufficient'.) – Glen_b Feb 17 '14 at 04:37
  • @Glen_b: Yes, thanks for the reminder for correction. I do not know how to use it in $\overline{X}$ case as I did not see $\overline{X}$ appeared in the pdf $1_{X_{(1)}}1_{X_{(n)}}$. – Bombyx mori Feb 17 '14 at 04:42
  • I'm not sure what 'it' refers to in your second sentence. If you are referring to my link, the idea is to try to follow what's going on in the general case in order to understand in general terms how to apply the factorization theorem to uniform cases. – Glen_b Feb 17 '14 at 04:43
  • @Glen_b: I mean by the definition of a sufficient statistic or the factorization theorem. Thanks for the short answer - but how to show it? Should not SLLN imply it is a good statistic? – Bombyx mori Feb 17 '14 at 04:45
  • Beware undefined words like 'good'. SLLN certainly doesn't imply sufficiency. (I was editing my earlier comment when you replied to it, you might like to reread it also.) – Glen_b Feb 17 '14 at 04:46
  • @Glen_b: I thought converge almost surely is rather "strong". So if given enough many samples I can conclude that the average of a coin toss should result in $0$. I see. – Bombyx mori Feb 17 '14 at 04:49
  • Rather strong in what sense? not in any sense that helps with sufficiency, nor with getting estimators with good properties. – Glen_b Feb 17 '14 at 05:04
  • Incidentally, here is how the mean-.5 performs compared to a better estimator on your problem for $n=50$. You can see that the mean is typically further away from the true value (which was 3.4) than the values from the other estimator. The mean does much worse as $n$ increases. – Glen_b Feb 17 '14 at 05:13
  • in brief: The SLLN does apply. The mean-0.5 does converge to $\theta$, but the mean is not sufficient for $\theta$. In large samples, the mean is a worse estimator than many other choices, and a much worse estimator than the best available one. – Glen_b Feb 17 '14 at 05:25
  • @Glen_b: I mean it converges almost surely, which is rather strong to me. I see. Thanks for the graph. – Bombyx mori Feb 17 '14 at 05:27
  • Imagine the following conditions hold for a pair of estimators: I have two unbiased estimators of some parameter (to make life simple), and the variance of one estimator is proportional to $1/\log(n)$ and the variance of another estimator is proportional to $1/n^2$. They both converge almost surely. In large samples, will they be equally useful? – Glen_b Feb 17 '14 at 05:30
  • 1
    Here's something for you to ponder: consider random samples from a normal distribution where we average all of the data and where we average the first $\lceil{\sqrt n}\rceil$ observations. As $n\rightarrow\infty$, do both have almost sure convergence? – Glen_b Feb 17 '14 at 05:37
  • @Glen_b: Sorry that I am about to sleep. I think for your first question the one with $1/n^{2}$ variance should be better. For your second question, I think both should have almost sure convergence, since for the second case $\overline{X}$ comes from a sample effectively reach infinity. But their variance is different, the second one's variance is $\sqrt{n}$ times that of the first one, since the variance of $\overline{X}=\frac{\sigma^{2}}{n}$ for a sample of size $n$. – Bombyx mori Feb 17 '14 at 05:55
  • Do you see then that the almost sure convergence is itself not much use when considering which statistics might contain all the available information, or which estimators will have small variance? – Glen_b Feb 17 '14 at 06:45
  • @Glen_b: Yes. I think in professional terms this means $\overline{X}$ is not the MSE. – Bombyx mori Feb 17 '14 at 14:48
  • I think you may have made a typo there; I guess you probably intended "MLE". Which, indeed, it isn't, just as you say, but that's not what I was getting at there; I was simply pointing out that your intuition about what was 'strong' about the SLLN was misplaced. I was hoping you may have obtained a better sense of what it tells you and doesn't tell you. – Glen_b Feb 17 '14 at 18:15
  • @Glen_b: Sorry, I mean $\overline{X}$ is not the one which minimizes the MSE. Thanks for the kind reminder. – Bombyx mori Feb 17 '14 at 18:46

1 Answers1

7

Sufficiency pertains to data reduction, not estimation per se. This is an important distinction to understand. Yes, a "good" estimator is usually a function of a sufficient statistic, but that doesn't mean that all sufficient statistics are estimators.

As for your specific example, a simple way to understand why $\bar X$ is not a sufficient statistic for $\theta$ is to consider the following experiment: suppose I tell you $\bar X = 10$. Is this equivalent to all the information pertaining to $\theta$ that we can get from the sample? Of course not: for instance, $X_1 = 9.5, X_2 = 10.4, X_3 = 10.1$ could give us $10$, but so could $X_1 = X_2 = 9.75, X_3 = 10.5$. If you only have knowledge of $\bar X$, you have lost information about $\theta$ that was available in the original sample: namely, that in the first case, we must have $\theta \in [9.4,9.5]$, and in the second, $\theta \in [9.5,9.75]$. Notice those intervals are nearly disjoint. This is why $\bar X$ is insufficient for $\theta$. You may be able to use it to estimate $\theta$ when $n$ is large, but as I have pointed out, sufficiency has to do with data reduction, not estimation.

heropup
  • 5,406