Why does the finite population variance requires the (N-1) factor in the literature?

Question

Recently I've encountered the formula for finite population variance in Davison's and Hinkley's book "Bootstrap Methods and their Application" (see screenshot below). The formula includes $(N - 1)^{-1}$ rather than $N^{-1}$ (as it should be according to wikipedia). I understand that the $(n - 1)^{-1}$ factor appears in the sample variance estimate (i.e. sample taken from the finite population), but here we deal with the entire population.

The source of bias in the sample based estimate is the uncertainty introduced by the mean estimate which by itself is a random variable, but if we're talking about entire population the mean obtained from the population is a fixed quantity and therefore should not effect the variance.

I checked a couple of articles online and I noticed the same definition in some of them as well. For example

Some explanation I found in another question which was posted in 2013: What is the difference between N and N-1 in calculating population variance?. It seems like one reasonable explanation is that using $(N-1)^{-1}$ factor is computationally more convenient (although I don't see any good examples that support the claim) which means that definition of the variance was changed for this particular application since it's no longer the variance in a normal sense, but rather some variation of the discrepancy measurement which gives quite often answers that are very close to the actual variance. The problem is that now the sample variance is biased with respect to the new variance definition, since it's missing $N(N-1)^{-1}$ scaling factor.

I would appreciate if somebody can help me to clarify the points

I wonder what your text really means by "var," because with the standard definition (second central moment), the "with replacement" part of (3.15) obviously is incorrect. Consider samples of size $n=1,$ for which the variance of $\bar Y$ reduces to the variance of the population, which requires a denominator of $N,$ not $N-1,$ in its definition. — whuber, Sep 14 '22 at 16:00

score 1 · Answer 1 · answered Sep 16 '22 at 12:03

1

When we are sampling with replacement then the population size is irrelevant.

Simple counterexample: Imagine an urn model with $k$ red balls, and $k$ blue balls. The probability to draw a red/blue ball is 0.5 independent from the number of total balls $N = 2k$ among the population. So the variance of any sample from that urn should be independent from the size of the population.

It is difficult to find the origin of the error in the text that quoted. It is easy to make little mistakes with $n$ versus $n-1$.

answered Sep 16 '22 at 12:03

Sextus Empiricus

77,915

That's an interesting perspective. I looked at $N$ more like a way of estimating a way of defining probability of sampling each individual ball, specifically $p(x)=1/N$, so when the $N$ is determined incorrectly the probabilities obtained are not adding up to exactly 1 and therefore estimate of variance is incorrect. – itdxer Sep 16 '22 at 13:33
I also believe there is one small nuance to your example. I believe it's impossible to have the set up with mean 0.5 and odd N. In a way you can say that you're conditioning probabilities on the particular model (i.e. the way balls are being sampled in your example). The problem is that sampling with 0.5 average is impossible when N is odd and since the model is impossible the probabilities which are conditional on the model cannot be defined. Specifically $P(X|M)$ is undefined if $P(M)=0$ (where $M$ is a particular model set up). (Wikipedia: http://tinyurl.com/y9972et2 ) – itdxer Sep 16 '22 at 13:43
@itdxer whether N is odd or even is irrelevant for my example. I just used the example as a counterexample to show that the formula from your quote is wrong. Whether or not it becomes more complicated in other cases is a different issue, but I used on purpose this simple case where it does not matter. – Sextus Empiricus Sep 16 '22 at 13:45

Ben · Accepted Answer · 2022-09-17T22:52:19.090

It is better to define the "population variance" with Bessel's correction (i.e., using $N-1$ in the denominator)

You will find that many of the texts in sampling theory give results pertaining to this population quantity, but many will avoid using the term "population variance" to name this quantity. The section of the notes that you highlight in your question is an example of this --- although they state a sampling result in terms of this quantity, they actually avoid giving it a name. Presumably they want to remain agnostic as to what quantity (if any) constitutes the "population variance".

In order to see whether it is better to include or exclude Bessel's correction in the "population variance", let us first examine the properties of the relevant quantities. Suppose you have a finite population of values $Y_1,...,Y_N$ and let $F_N$ denote their empirical distribution. To facilitate our analysis, we will define two alternative population quantities, with and without Bessel's correction:

$$R_N^2 = \frac{1}{N} \sum_{i=1}^N (Y_i-\bar{Y}_N)^2 \quad \quad \quad \quad \quad S_N^2 = \frac{1}{N-1} \sum_{i=1}^N (Y_i-\bar{Y}_N)^2.$$

In the text you are looking at they are giving moments that are (implicitly) conditional on the population values. I will make this conditioning explicit by stating conditioning on $F_N$. If we take a simple-random-sample without replacement from the population (equivalent to assuming that we sample $Y_1,...,Y_n$ from an exchangeable finite population) then it can be shown that:

$$\mathbb{V}(\bar{Y}_n | F_N) = \frac{N-n}{N-1} \cdot \frac{R_N^2}{n} = \frac{N-n}{N} \cdot \frac{S_N^2}{n}.$$

(The latter result is the one that Davison and Hinkley state as $\mathbb{V}(\bar{Y}) = (1-f) n^{-1} \gamma$ in their own notation; I prefer to state it with explicit conditioning and using my own notation.) As you can see from this result, both formulae involve a "finite-population correction" term and a "variance" term. If you use the quantity $R_N^2$ (that does not include Bessel's correction) then you end up shoe-horning Bessel's correction into the finite-population correction term. However, if you use the quantity $S_N^2$ (that does include Bessel's correction) then you get a much cleaner and more natural form for the finite-population correction term.

If we wish to allow a general notion of a "variance" of a set of numbers (as opposed to a variance of a probability distribution), it is natural to ensure that the definition of the "variance" remains consistent as the sample size changes. It is also desirable that it have other useful properties such as being an unbiased estimator with a stable expectation, as is the case for the standard definition of the "sample variance". This is accomplished by defining both the "sample variance" and "population variance" to incorporate Bessel's correction ---i.e., the sample variance is $S_n^2$ and the population variance is $S_N^2$. This makes many results in sampling theory cleaner and simpler, as with the above conditional variance result. It also ensures that the expectation of "the variance" remains fixed and does not "jump down" suddenly at the point where $n=N$. Another desirable property of this approach can be seen if we use model-based sampling theory where we embed the finite population within an infinite superpopulation $Y_1,Y_2,Y_3,...$ with mean $\mu$ and variance $\sigma^2$. In this case both the sample variance $S_n^2$ and the population variance $S_N^2$ is an unbiased estimator of the superpopulation variance $\sigma^2.$

You will find some statisticians who claim that $S_n^2$ is the "sample variance" but that $R_N^2$ should be regarded as the "population variance". This argument is usually made on the basis that $R_N^2$ is the variance of the empirical distribution of the population.$^\dagger$ While that property is indeed true, the argument that this gives a good claim to being the "population variance" is weak and unconvincing --- in particular, it is inconsistent with defining the "sample variance" as $S_n^2$, since the latter is not equal to the variance of the empirical distribution of the sample. This approach also gives rise to several problems, including an immediate and annoying incongruity when we look at "the variance" of a set of numbers. Under this approach, we end up applying Bessel's correction for all $n<N$ but then suddenly the formula changes and the variance "jumps downward" when we get to $n=N$ (as does its expectation, etc.). This introduces all sorts of silliness into the discussion and creates a number of downstream inconsistencies in relevant formulae (e.g., the fact that Bessel's correction ends up getting shoe-horned back into the finite-population correction term). Unfortunately this approach is something that is commonly asserted to bemused students in undergraduate statistics courses, without any convincing argument backing it up. Students naturally wonder why "the variance" of a set of numbers should suddenly jump up at one particular value of $n$ but they are given no satisfying explanation for this (because there is none).

Probably the most damning aspect of treating $S_n^2$ as "the variance" of a sample and $R_N^2$ as "the variance" of the population is that you then literally cannot compute the variance of a set of numbers without knowledge of the population size! This occurs because the relevant formula changes between $n<N$ and $n=N$ so that the variance formula is now a function of $N$ ---i.e., you need to know if the numbers you have are the full population or not. If I give a practitioner holding this position a set of numbers, with no further information, they literally cannot compute "the variance" of those numbers --- extraordinary!

As noted at the start of this answer, many authors in this field sidestep the discussion entirely by refusing to call anything the "population variance" and merely referring to the quantities at issue with their algebraic notation. While this is a defensible position, I prefer to take the approach of saying that there should be a meaningful definition of "the variance" of any set of numbers, and the quantity $S_N^2$ is the one that fits the bill for the population values. You will find that if you adopt this view, a whole lot of things in sampling theory become a lot simpler and cleaner.

$^\dagger$ That is, it can be shown that:

$$R_N^2 = \int \limits_\mathbb{R} (y-\bar{Y}_N)^2 dF_N(y).$$

A secondary argument sometimes presented is that this quantity is equal to the conditional variance of a single sampled value when we condition on the population values ---i.e., we have $R_N^2 = \mathbb{E}(Y_i|F_N)$. That is also not a good basis to define the quantity as the "population variance" since this latter result refers only to variability in a single value.

Regarding the inconsistency with the definition of the sample variance: note that there are authors that define the sample variance (with explicit reference to the moments of the empirical distribution) as $\frac{1}{n} \sum_{i=1}^n (Y_i-\bar{Y}_n)^2$. IMHO this is equally defensible. — statmerkur, Sep 16 '22 at 14:32
I find these arguments unconvincing. One of the reasons is that $S_N^2$ is undefined for populations with $N=1.$ This alone ought to override the subjective "unnaturality" of $(N-n)/(N-1)$ compared to $(N-n)/N.$ Another is that the dismissal of the empirical distribution argument appears to be based on an error: the variance of the empirical distribution is $R_N^2.$ A third is that "Bessel's correction" is used for estimators rather than to characterize properties of populations. A fourth is that the with-replacement claim made in the quotation is incorrect. — whuber, Sep 16 '22 at 21:56
@whuber: (1) The correction term $(N-n)/(N-1)$ is also undefined (or just infinity) if $N=1$, so neither method has any advantage in that special case; (2) I don't know why you think I am disagreeing that $R_N^2$ is the variance of the empirical distribution when I have stated exactly that in the question; (3) That is question-begging; moreover, the population variance is an unbiased estimator for the superpopulation variance; (4) I have not commented on sampling with replacement here and it is not part of my argument (and I agree that there is an error in the quoted section in this case). — Ben, Sep 16 '22 at 22:35
@whuber: Out of interest, do you consider $S_n^2$ to be the "sample variance" and $R_N^2$ the "population variance"? If so, how do you reconcile those disparate treatments? (If not, what alternative treatment do you prefer?) — Ben, Sep 16 '22 at 22:37
I confess to being completely lost by your use of "population variance" and "superpopulation variance." I am not trying to discuss terminology -- I rarely introduce "sample variance" and "population variance" unless they are already in use in a context, because I find them potentially confusing -- and I have only done my best to interpret your post; I might have misinterpreted it; but I cannot find mathematical justification for that interpretation. The treatments I prefer are those that are clear and demonstrate their mathematical accuracy. — whuber, Sep 16 '22 at 23:18
@whuber: Sorry, perhaps I was a bit forceful in my response. I have edited the answer to clarify my remarks about the superpopulation variance. My concern in the answer is (1) to make the case that $S_N^2$ is the best quantity to consider as the “population variance”; and (2) show that framing of results in terms of this quantity is simpler and more natural than framing in terms of $R_N^2$. I doubt we disagree on any of the relevant mathematical results, so the question really comes down to terminology and ease of usage of alternative ways of framing the relevant formulae. — Ben, Sep 16 '22 at 23:58
@whuber: Incidentally, I have a partially written paper on this topic that I plan to complete, which should set things out more systematically. Hopefully I can finish and publish that to give a more fleshed-out argument. — Ben, Sep 16 '22 at 23:58
@whuber Proof for the variance of the mean without replacement implicitly assumes that $N \gt 1$ and the $(N-1)$ factor in there appears from the $E[Y_i Y_j]$ where $i \ne j$, since $P(Y_i, Y_j) = 1/(N(N-1))$. Interaction between two different variables is obviously impossible whenever $N=1$. The variance of the sampling with replacement on the other hand I believe is incorrect. As has been pointed out by the Sextus Empiricus, there should be no way of differentiating between finite and infinite population variance and the answer should be the same. — itdxer, Sep 17 '22 at 13:19
@Ben I believe the reason why the "jump" in the variance happens when we increase sample size from $N-1$ to $N$ is because the mean, as well as the $Y_j$ values which are being used to compute it, are no longer random variables when $n=N$. In the proof of the Bessel's correction $E[Y_iY_j]$ is interpreted differently depending if $i$ equals to $j$ or not (e.g. E[Y^2] or E[Y]^2). But the interpretation is different only because both $Y$s are random variables. In the full population variance estimate $Y_j$ is constant and therefore $E[Y_iY_j]=E[Y_i]Y_j=E[Y]Y_j$ — itdxer, Sep 17 '22 at 15:26
@Ben The argument about superpopulation is very interesting, especially it's a lot more relevant to the bootstrap topic and I didn't think of it from this perspective. Would you say that $^2_$, which is an unbiased estimated from the finite population, is also an unbiased estimate of the variance of the superpopulation? — itdxer, Sep 17 '22 at 16:25
@Ben I agree that authors don't use population variance explicitly, but I believe it's being implied implicitly through the formula for the variance of the mean estimate when the observations are sampled with replacement, since $Var[\bar{Y}]=Var[Y]/n$ which means $\gamma=Var[Y]$. Although I think the argument about superpopulation might actually invalidate it since there $\gamma$ is just an estimate of the superpopulation variance. But I'm still not sure if replacing $\gamma$ by $c$ will make an unbiased estimate — itdxer, Sep 17 '22 at 16:33
That's not really a justification for a jump --- it's just a very hand-wavey appeal to the internal mechanics of one particular proof (of a result that supports the use of the correction at issue). As $n$ increases up to $N$, the level of randomness (measured by variance) in $\bar{Y}_n$ decreases down to zero in a smooth fashion. The mere fact that the variance finally goes down to zero at $n=N$ is no reason that we should suddenly change the formula for the variance. — Ben, Sep 17 '22 at 22:57
There is no jump. If you take $n$ samples with replacement (so each sample observation is independent) then your $S^2 =\frac1{n-1} \sum_1^n (Y_i-\bar Y_n)^2$ is an unbiased estimator of the variance of each $Y_i$ and $\frac{1}{n}S^2$ an unbiased estimator of the variance of $\bar Y_n$. If you are sampling without replacement from a finite population then there is dependence and negative correlation so these estimators are biased; an unbiased estimator of the variance of each $Y_i$ is $\frac{N-1}{N} S^2$ including when $n=N$ and of the variance of $\bar Y_n$ is $\frac{N-n}{N-1}\frac1n S^2$ — Henry, Mar 10 '23 at 01:23

Why does the finite population variance requires the (N-1) factor in the literature?

2 Answers2

It is better to define the "population variance" with Bessel's correction (i.e., using $N-1$ in the denominator)

Linked