10

The definition of a sufficient statistic is: Let $X_1,...,X_n$ be a random sample from a distribution indexed by a parameter $\theta$. Let $T$ be a statistic. Suppose that, for every $\theta$ and every possible value $t$ of $T$, the conditional joint distribution of $X_1,...,X_n$ given that $T=t$ depends only on $t$ but not on $\theta$. Then, $T$ is a sufficient statistic for parameter $\theta$.

I feel like I know several pieces of the puzzle (like the factorization theorem) to understanding sufficient statistics but do not have the overall theory down.

My main questions are:

1) Why do they say that $T$ is a sufficient statistic for parameter $\theta$? If $\theta$ were the population mean of a normal distribution, say $\mu$, does it mean that anytime we want to find the probability of, say, $X_1,...,X_n$ occurring in a certain way, that we don't need the value of the mean of the population?

2) In real-life, why do we want to use a sufficient statistic? Is seems that just calculating the statistic shouldn't be that much work (such as the sum of X's) so why do we need it?

Thanks!

user123276
  • 2,077

1 Answers1

7
  1. No. What they say is if $X_1^\prime,\dots,X_n^\prime$ is another random sample from the same population as the original data $X_1,\dots,X_n$, it contains an equal amount of probabilistic information about $\theta$. Therefore, we can "recover the data" if we retain $T$ and discard $X_1,\dots,X_n$. That’s why $T$ is "sufficient".

  2. Data reduction. If $T$ is sufficient, the "extra information" carried by $X$ is worthless as long as $θ$ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle: Any inference procedure should depend on the data only through sufficient statistics.

See here for more detail on principles involved in data reduction.

Hibernating
  • 3,943