-4

I am confused about how a normal distribution is drawn. Are the elements drawn with or without replacement?

RSol
  • 9
  • 1
  • 2
    You seem to be confusing basic concepts of discrete and continuous probability distributions. Check wikipedia or some textbooks https://stats.stackexchange.com/questions/170/free-statistical-textbooks they have better descriptions than what could be found here. – Martin Modrák Jul 30 '18 at 11:06
  • 2
    "Normal distribution" describes the numbers written on a collection of balls in a jar, regardless of how you plan to sample those balls. – whuber Jul 30 '18 at 14:16
  • This question isn't as silly as it seems. We have also a related question like Does rnorm produce numbers with replacement/without replacement? that is well received. Also, the question asker is confused and asks about that. The fact that the question is confusing should not be a reason that the question is not useful. The question is about the confusion. – Sextus Empiricus Dec 26 '23 at 21:39

2 Answers2

3

While the question is more or less wrong because the type of sampling depends on your need, it has however an interesting point. We extend the discussion on discrete/continuous random variables, since it is more general topic.

In general you want a sample with replacement when you need draws which are independent and you want a sample without replacement when you need dependence between draws for some purpose.

We will continue our discussion on a technical side. When you have the whole population at hand, specified by enumerating all population values, on sampling multiple draws you have the choice to replace the previously drawn value or not. What about the case when you do not have the whole population available, only a functional form of it the question is how do you replace a value? The only way to replace it is to remember it somehow, but even if you remember, how do you proceed?

Let's take a continuous distribution. You draw a sample, you want to remember it. If if you would remember it, you have a finite float representation. On the other hand you have an infinity of values for each limited floating point value. If you remember what you have drawn you simply interdict drawing values from the infinity of values with the same limited representation and your sample is simply not from the same distribution. So for continuous variables is close to impossible to implement 'with replacement' and even if you do, is practically useless since for most applications there will be no notable differences.

Let's take a discrete distribution, Binomial(10,0.5). How many observations are in your population? 10 at each draw. You draw a value from that distribution, let's say is 5. The question is: which would be the meaning of remembering 5 and not drawing after the first draw? It certainly does not mean 'without replacement' because you do not obtain a value from the same population described by one draw of Binomial(10,0.5), it' something else. So you can say that drawing samples from Binomial(10,0.5) can be done only with replacement.

Hypergeometric distribution assumes again a finite population and a finite composition, like 10 balls, 3 black and 7 red and count reds in 3 draws. In those cases the drawing is without replacement by construction.

As a conclusion: if the population is finite and enumerated or in an equivalent form, that you can do either with or without replacement. Some discrete variables on finite populations assumes by construction only one way of sampling. For continuous distributions or infinite populations is simply impossible and/or impractical to implement it, and whenever is possible to implement it produces very similar results. In those cases the sampling can be considered with replacement.

rapaio
  • 6,974
0

Replacement relates in the first place to the random 'process' but not to the 'distribution' that describes the process.

With and without replacement relates to a random process where values are drawn from some population and each draw the drawn value is either 'returned to the population' or 'removed from the population'.

Random distributions do not have this property of draw with or without replacement. A distribution can relate to such process, but it is not equivalent.

Connection between distributions and random processes

As an example how 'distributions' might be connected to a 'random process': consider the case when we draw some number of balls from an urn with red and blue balls, then

  • the binomial distribution is a distribution related to the number of red balls when we draw with replacement
  • the hypergeometric distribution is a distribution related to the number of red balls when we draw without replacement.

But this hypergeometric distribution can be repeatedly sampled itselve (we can repeat the sampling process beginning from the starting situation) and is itself not a distribution that has the property of being drawn without replacement.

Distributions can be related to sampling with replacement but it doesn't mean that a sample 'drawn from that distribution' is something that is drawn with replacement.

To what sort of sampling (with/without replacement) does a normal distribution relate?

A normal distribution is not a distribution that relates specifically to either process with/without replacement. Both processes can relate to a normal distribution as the distribution is more like a theoretical concept as the limiting distribution at infinity of a finite process. For example consider the draw of two times a binomial distributed variables $X(n,p)$ and $Y(n,p)$ with a restriction that $Y$ is redrawn when it is the same as $X$. Then it is drawing without replacement but both variables can still be modelled as normal distributed variables. E.g. we still have that the normalized variable approaches a normal distribution $\frac{Y(n,p)-np}{\sqrt{np(1-p)}} \to N(0,1)$.

An interesting related question is Does rnorm produce numbers with replacement/without replacement? the answers to it actually answer the question stated here. But rnorm and runif are random number generators and not the distributions. These generators might possibly be such that they do never repeat (untill we got all numbers that the computer has available and neccesarily need to start over). But, they don't and there are repetitions of numbers (before the cycle starts over) as explained in this question R: Problem with runif: generated number repeats (more often than expected) after less than 100 000 steps

  • By the time I reached "...not a distribution that has the property of being drawn without replacement" I was thoroughly confused. Let me approach that confusion this way: given a "distribution" (which I presume is a mathematical description of the distribution of some random variable), exactly how would I test whether it has the "property of being drawn without replacement"?? – whuber Dec 27 '23 at 16:08
  • @whuber I start to like it when you get confused by my posts and mention that they are confusing. I will review my post and try to improve it. – Sextus Empiricus Dec 27 '23 at 18:11
  • @whuber my point is that distributions don't have this property. While distribution might be seen as connected to a process that has the property, it is not the distribution that describes the process that has the property. – Sextus Empiricus Dec 27 '23 at 18:39
  • That's likely why I found your phrase so confusing! – whuber Dec 27 '23 at 20:55