2

In my basic statistics course we always assume that the sample observations are iid. I get why they would be iid (eg. in the case of a coin toss) and I also intuitively get that if they are not iid then the observations we get would probably have some bias or inaccuracies but is it possible to mathematically prove that?

Thanks!

Carrie
  • 21

1 Answers1

7

Consider an urn with 3 red balls and 4 green balls. The experiment consists of drawing two balls in succession from the urn, with all balls in the urn being equally likely to be chosen. Let $X$ be a Bernoulli random variable that has value $1$ if and only if the first ball drawn is red, and $Y$ a Bernoulli random variable that has value $1$ if and only if the second ball drawn is red.

Sampling with replacement: Suppose that the experimenter draws one ball from the urn, notes its color, returns the ball to the urn, shakes well (this is to ensure that the ball just tossed back in is not sitting on top of the pile and so very likely to be drawn again), and then draws a second ball.

It is easy to determine that $P(X=1) = P(Y=1) = \frac 37$, that is, $X$ and $Y$ are identically distributed and it is also easy to verify that $X$ and $Y$ are independent identically distributed random variables.

Sampling without replacement: The first draw is as described above but after the color of the first ball has been noted, the ball is not returned to the urn (so the urn now has only six balls in it), and (after shaking well again) the second ball is drawn. Clearly $P(X=1) = \frac 37$ as before. But, what freaks beginning students out is that $P(Y=1)$ also equals $\frac 37$ !! Note that the experiment has $7\times 6 = 42$ outcomes instead of $49$ outcomes in the sampling with replacement described above, but if you make a list of all $42$ outcomes, $18$ of them are of the form $(\star_i, R_i)$ where $R_i$ is the $i$-tj red ball, $i = 1,2,3,$ while $\star_i$ is any ball other than $R_i$. So $$P(Y=1) = P(\text{second ball red}) = \frac{18}{42}=\frac 37$$ as claimed. Thus, $X$ and $Y$ are identically distributed random variables, but they are not independent random variables. Note that $$P(X=1, Y=1) = P((R_i,R_j)) = \frac{6}{42} = \frac 17 \neq P(X=1)P(Y=1).$$

As an example of random variables that are independent but not identically distributed, let $Z=1-Y$ be a Bernoulli random variable that has value $1$ if and only if the second ball is green. In sampling with replacement, it is easily verified that $X$ and $Z$ are independent but not identically distributed while in sampling without replacement, $X$ and $Z$ are neither independent nor identically distributed.


In short, whether or not the iid assumption is valid totally, or in part (the i but not the id or vice versa) , or not at all (neither the i nor the id) is something that depends on how the experiment is carried out. This is not something that can be proved mathematically; rather it is an issue of modeling, how we translate the grubby facts of coins that have been tossed so often that it is hard to distinguish Heads from Tails, and unshaken urns that nobody told us about, into mathematical symbols and notations where we can apply a purely intellectual approach.

Dilip Sarwate
  • 46,658