Sufficient statistic for sample without replacement

Question

This is an exercise that I faced today:

Let $\mathcal{M}$ be a population of ’individuals’ with identification numbers in $\mathbb{Z}$. We assume that the set of all identification numbers equals $\{a, . . . , b\}$ with unknown integers $a, b$. We only know that $b − a + 1 \geq n ∈ \mathbb{N} \setminus \{1\}$.

Now we draw a random sample of size n without replacement from $\mathcal{M}$ and note the tuple $ω = (ω_1, ω_2, . . . , ω_n)$ of identification numbers. Show that $S(ω) =\left (min(ω), max(ω) \right )$ is a sufficient statistic for this experiment. Describe the conditional distribution of $ω$, given $S(ω) = (s_1, s_2)$.

First, I define:

$$\Omega=\{ \omega \in \mathbb{Z}^{n}: \omega_i \neq \omega_j, \text{whenever}\ i \neq j \}$$

$$P_{a,b}=\text{Unif}(\ \Omega \ \cap \{ a,...,b \}^n), $$

$$\Theta=\{ (a,b) \in \mathbb{Z}^{n} : b-a+1 \geq n \}.$$

score 1 · Accepted Answer · answered Jul 12 '18 at 02:34

Part of this problem is just a matter of reframing the question so that it is not a notational nightmare:

Reframed question: Consider a population set containing $N \in \mathbb{N}$ consecutive integers:

$$\mathcal{M} \equiv \{ a, a+1, a+2, ..., a + N-1 \},$$

where $a \in \mathbb{Z}$ is the smallest integer in the set. (In this re-framing of the problem, we have the equivalence $b = a+N-1$ with your framing.) We sample $n \leqslant N$ values from this population using simple-random-sampling without replacement (SRSWOR) yielding the following observed data vector and statistic:

$$\mathbf{x} = (x_1, ..., x_n) \quad \quad \quad s(\mathbf{x}) = ( x_{(1)}, x_{(n)} ).$$

We want to show that $s(\mathbf{x})$ is sufficient for the parameter vector $(a,N)$ (which fully describes the population set), where we note that we must have $a \leqslant x_{(1)} \leqslant x_{(n)} \leqslant a+N-1$.

Solution: Conditional on the minimum sample value $x_{(1)}$ and the maximum sample value $x_{(n)}$, the remaining $n-2$ sample values are SRSWOR from within the restricted range $x_{(1)} <x< x_{(n)}$. (This fact is left as an exercise for the OP to prove, but it is intuitively obvious.) We therefore have:

$$p( \mathbf{x} | s(\mathbf{x}), a, N ) = p( \mathbf{x} | x_{(1)}, x_{(n)}, a, N ) = 1 \Big/ {x_{(n)} - x_{(1)} - 1 \choose n-2} = p( \mathbf{x} | x_{(1)}, x_{(n)}) .$$

Since this conditional probability does not depend on the parameter $(a,N)$, it follows by the definition of sufficiency that $s(\mathbf{x})$ is sufficient for $(a,N)$.

Remark: This sufficiency problem is essentially just recognising that when we SRSWOR, the conditional distribution of the middle values, conditional on the minimum and maximum, is still SRSWOR within the restricted range bounded by those values. Although the population set here contains consecutive integers, this restriction does not contribute to the problem. The result would be the same if the population set consisted of any arbitrary finite set of numbers with bounds determined by the parameters.

Sufficient statistic for sample without replacement

1 Answers1