0

(I am aware of a similar question here but I feel the answer on it is too open-ended)

Assuming that the population is unchanged between both sampling stages and that we're using sampling without replacement, we would have the following probabilities for selection at each stage:

$$P(x_i \in S_{pilot}) = \frac{n_{pilot}}{N}$$ $$P(x_i \in S_{full}) = \frac{n_{full}-n_{pilot}}{N}$$

Because the events are mutually exclusive,

$$P(x_i \in S_{full} \cup x_i \in S_{pilot} ) = \frac{n_{full}-n_{pilot}}{N} + \frac{n_{pilot}}{N} = \frac{n_{full}}{N}$$

and so the probability of selection is the same as it would be if a sample of size $n_{full}$ had been taken from the population.

Is this correct or am I overlooking something?

  • if you only continue to collect data after the pilot if the pilot shows indications of some desired result, then your full sample would potentially be biased if you included the pilot data. There might be no true effect but the pilot showed apparent effects by chance. How do you think this would affect the kinds of samples you would see? – Marius Apr 19 '17 at 00:56
  • Right, this would bias measures towards the desired result. In this case the pilot is merely used to obtain reference values to calculate the full sample size, the second stage isn't conditional on the values of the first. – overdisperse Apr 19 '17 at 00:59

0 Answers0