3

As I began my study of sufficient statistics I stumbled upon a definition that puzzled me. The conditional probability distribution of the sample values given an estimator $\hat{\Theta}=\hat{\theta} $ is given by

$$ f\left( x_1,x_2,\ldots,x_n|\hat{\theta} \right) = \frac{f \left(x_1,x_2,\ldots,x_n,\hat{\theta} \right)}{g\left( \hat{\theta} \right)}=\frac{f\left( x_1,x_2,\ldots,x_n \right) }{g \left(\hat{\theta} \right)} $$

The first equality is of course the definition of the conditional distribution $P \left(A| B \right) = \frac{P\left( A \cap B \right)}{P \left( B \right)} $. What I do not understand is where the numerator in the second equality comes from.

It looks like we are assuming that $A \subset B \Rightarrow A \cap B =A $. But how is that possible in our case? Any insight on that? Thank you!

JohnK
  • 20,366
  • Are these random variables discrete? Otherwise the first equality does not necessarily make sense. Typically $(X_1, \ldots, X_n \mid \hat \Theta)$ will not admit an $n$-dimensional density, nor will $(X_1, \ldots, X_n, \hat \Theta)$ admit an $n+1$-dimensional density. – guy Oct 30 '13 at 19:07
  • @guy Not necessarily discrete but why do you think it does not make sense? – JohnK Oct 30 '13 at 19:14
  • If $X_1, \ldots, X_n \sim N(\mu, 1)$ then $\bar X$ is sufficient for $\mu$. Does it make sense to write $f(x_1, \ldots, x_n, \bar x)$? It only makes sense if the density is defined on $\mathbb R^n$ rather than $\mathbb R^{n+1}$. You can't just appeal to your definition of conditional probability, because these are density functions; they don't have sets as their arguments. – guy Oct 30 '13 at 19:37

1 Answers1

4

In short, the value of a statistic is completely determined by the observed data, so given the values for $x_i$, the probability that the sufficient statistic takes a particular value is guaranteed, as $\hat \theta = h(x_i)$ for some function h.

Theoretically, a sufficient statistic "encapsulates" the information in your data about a particular parameter, so the conditional distribution of the data should no longer depend on the parameter you have estimated. In actual calculations, $\theta$ will drop out of your final formula.

  • I don't think this is it. As I wrote above my opinion is that we have to consider the estimator as the whole set and the sample values as a subset and thus the conditional probability takes that form. – JohnK Oct 30 '13 at 14:13
  • I think you were reading an earlier version of my answer. See corrected version. –  Oct 30 '13 at 14:14
  • If I had to guess I would say that the numeerical value of our estimator can result from many combinations of our sample values and that's why they are a subset of the estimator but it's still not clear for me. – JohnK Oct 30 '13 at 14:19
  • 1
    Its simpler than that. $\hat \theta$ is a deterministic function of the observed data, so it is not a random variable if you already know what data you have. If all you know is that the statistic has a particular value, then, as you indicated, there are only so many combinations that allows this, one of which will always be the data that you observed and used to calculate the statistic. So yes, you can view the observed data as a subset of the possible data that could have resulted in the observed statistic value. –  Oct 30 '13 at 14:25