11

The z-test to compare two proportions is $\newcommand{\p}{\hat{p}}\newcommand{\v}{\mathrm{Var}} z=\frac{\p_1-\p_2}{\sqrt{\v(\p_1-\p_2)}}$. Usually it is defined that

$$\v(\p_1-\p_2)=\p(1-\hat{p})(1/n_1+1/n_2),$$

where

$$\p=\frac{n_1 \p_1+n_2 \p_2}{n_1+n_2}.$$

Is there any written reference that legitimizes me instead to use the unpooled variance, that is

$$\v(\p_1-\p_2)=\frac{\p_1(1-\p_1)}{n_1}+\frac{\p_2(1-\p_2)}{n_2}?$$

cardinal
  • 26,862
glassy
  • 1,070
  • If your null hypothesis is $H_0: \pi_1 - \pi_2 = k $ with $k \in ]0,1]$, then you are assuming, under the null, that $\pi_1 \neq \pi_2$ and in that case the best estimator of the variance is the unpooled variance. Should only use pooled when assuming $\pi_1=\pi_2$.- – JPMD Mar 02 '23 at 11:00

3 Answers3

12

The unpooled variance tends to be too small. This is because under the null hypothesis there will still be chance variation in the two observed proportions, although the underlying probabilities are equal. This chance variation contributes to the pooled variance but not to the unpooled variance.

As a result, $z$ for the unpooled statistic does not even approximately have a standard normal distribution. For instance, when $n_1 = n_2$ and the true probabilities are both $1/2$, the variance of $z$ is only $1/2$ instead of $1$. By using tables of the standard normal distribution, you will get incorrect p-values: they will tend to be artificially small, too often rejecting the null when the evidence is not really there.

Nevertheless, one wonders whether this could be corrected. It can. The question becomes whether a corrected value of $z$, based on unpooled estimates, could have greater power to detect deviations from the null hypothesis. A few quick simulations suggest this is not the case: the pooled test (compared to a properly adjusted unpooled test) has a better chance of rejecting the null whenever the null is false. Therefore I haven't bothered to work out the formula for the unpooled correction; it seems pointless.

In summary, the unpooled test is wrong, but with an appropriate correction, it can be made legitimate. However, it appears to be inferior to the pooled test.

whuber
  • 322,774
  • You say "For instance, when $n_1=n_2$ and the true probabilities are both 1/2, the variance of z is only 1/2 instead of 1." But if the unpooled variance is too small, the variance of z should be too large, and I would think it would be only slightly too large. – Karl Oct 18 '11 at 16:35
  • Forgive me but I am unable to follow your example. Why should the variance of $z$ be 1? Which values are you assuming for $\hat{p}_1$ and $\hat{p}_2$? – glassy Oct 20 '11 at 11:46
  • @glassy $z$ has (asymptotically) unit variance by construction: the difference $\hat{p_1}-\hat{p_1}$ has been standardized by dividing it by its estimated variance. – whuber Oct 20 '11 at 15:23
  • 1
    I don't want to bother you but really I do not undertand why if $z$ has unit variance by construction you state that its variance can be $1/2$. It seems to me that its variance is equal to $\hat{p}(1-\hat{p})\frac{2}{n}$ in a case and $\frac{\hat{p}_1(1-\hat{p}_1)}{n}+\frac{\hat{p}_2(1-\hat{p}_2)}{n}$ in the other. Sorry, I do not understand how these quantities have a 2:1 ratio. Indeed, in the case $\hat{p}_1=\hat{p}_2$ they are the same. – glassy Oct 20 '11 at 20:23
  • I don't agree at all. Why don't say also that the building of the confidence interval for the difference between two proportions contradicts the normal distribution? Indeed, first: in any case $z$ cannot have the $t$ distribution, because it is not a mean (or sum or linear combination) of normal random variables. On the contrary, it converges directly to the normal distribution when $n$ diverges (or $n_1$ and $n_2$, if you prefer). Second: the pooled and unpooled estimators of variance are both correct and consistent. – glassy Oct 21 '11 at 07:23
  • 1
    Moreover: they are asymptotically equivalent. With large $n$, it is indifferent which one to use, and you did not explain why in a certain case (which one, precisely?) one leads to a standardized variable whose variance halves that of the standard normal variable (I asked for the computation, but I did see anything). – glassy Oct 21 '11 at 07:23
  • 1
    For small or moderate $n$, they can differ in terms of power level, as for example it happens when standardizing the difference betweence two means of normal variables using a pooled or an unpooled variance estimator. In such a case, you can state that only the first option has a $t$ distribution, you can argue that the second option has less power than the first one, but you cannot state that the second option does not have asymptotically the $z$ distribution. I'm sorry for splitting my comment, but the site has a pressing limit in the number of characters. – glassy Oct 21 '11 at 07:23
  • @glassy There's too much to respond to here. You have said many things, which might or might not be true, but you provide no analysis or calculations to support them. Is there a particular point you're trying to make that would help clarify or improve my response? – whuber Oct 21 '11 at 14:49
  • 1
    You state that unpooled $z$ hasn't asymptotic standard normal distribution. I claim that this statement is false. – glassy Oct 22 '11 at 06:29
  • Since the variance of $z$ is always $1/2$, there is no way it could asymptotically have a distribution which, by definition, has a variance of $1$. – whuber Dec 11 '15 at 17:06
  • @whuber Do you have a reference for the statements provided in your answer? I did find a paper regarding pooled vs unpooled t-test, where the pooled version is also preferred, but not for the two proportions z-test. – Amonet Nov 03 '20 at 09:12
  • 1
    @Amonet All the statements I made in this post were derived from the simulations and reasoning I have described. I did not refer to any authorities to deduce them. – whuber Nov 03 '20 at 14:04
  • @luke.sonnet I believe an application of Jensen's inequality to the function $p\to p(1-p)$ will prove the unpooled variance is smaller than the pooled variance. I also believe the pooled variance is an unbiased estimator of the variance of $z.$ The "too small" conclusion follows. – whuber Nov 24 '21 at 22:49
  • This answer is incorrect. Asymptotically, there are no difference between using the pooled or the unpooled variance. You are likely using the wrong estimator for the unpooled variance in your simulations; remember, for $n_1 + n_2 = n$, the unpooled variance should be $\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}$, not $\frac{\hat{p}_1(1-\hat{p}_1)}{n}+\frac{\hat{p}_2(1-\hat{p}_2)}{n}$ – Guillaume F. Apr 27 '22 at 01:50
  • @GuillaumeF. Why would asymptotic behaviors be relevant to this question? – whuber Apr 27 '22 at 14:55
  • @whuber when the finite sample properties of an estimator are complicated or intractable, we study the asymptotic properties instead as they give us a good idea of what happens as the size becomes large. – Guillaume F. Apr 27 '22 at 19:07
  • @Guillaume Of course. But "as the size becomes large," although often suggestive, does not necessarily explain anything about small sizes. – whuber Apr 27 '22 at 20:30
  • @whuber I did some simulations showing you are wrong – Guillaume F. Apr 29 '22 at 03:37
  • @Guillaume "You are wrong" sounds calculated to be inflammatory rather than helpful. It would be nice to know what specific assertion you believe to be incorrect. – whuber Apr 29 '22 at 11:40
  • 1
    "The variance of z is only 1/2 instead of 1" is incorrect. – Guillaume F. Apr 29 '22 at 12:11
  • @Guillaume Thank you: I'll check into that when I get a chance. – whuber Apr 29 '22 at 12:19
9

There is quite a bit of discussion about this on the AP site.

You can use whatever statistic you want, provided that you are clear about what you do and look at the appropriate null distribution to calculate p-values or thresholds.

But some statistics are better than others; in this case you'd be looking for (a) null distribution easily calculated and (b) power to detect difference.

But I don't know why you'd favor the unpooled variance over the pooled variance for the test, though it could be preferred in calculating a confidence interval for the difference.

Karl
  • 6,197
  • 1
    +1 That's a good discussion you found. However, it seems to fall short of really addressing the question, which is whether somehow the pooled statistic could be corrected to give the desired test size and--perhaps--yield greater power. To resolve this issue, I have provided a separate reply. – whuber Oct 18 '11 at 16:05
  • 1
    Your link doesn't go to a discussion; it goes to a page with Charles Peltier's viewpoint. Not sure why this is the selected answer as it doesn't answer anything for me. Use whatever statistic isn't concrete enough. – Jarad Jul 27 '16 at 15:40
  • 3
    @Jarad One definition of the word "discussion" is "a detailed treatment of a particular topic"; that's what I meant. The selected answer is chosen by the person asking the question. By "use whatever statistic you want", I was referring to the "...reference that legitimizes me..." part of the question. – Karl Jul 28 '16 at 16:31
2

The unpooled z-test is valid, but in general has worse small-sample properties than the pooled z-test

Consider the estimated difference of proportions $\hat{d} = \hat{p}_1 - \hat{p}_2$. There is nothing fundamentally wrong with using the following unpooled estimate of the variance of $\hat{d}$:

$$ \hat{V}_{U} = \frac{\hat{p}_1 (1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2 (1 - \hat{p}_2)}{n_2} $$

In fact, it is frequently used when constructing confidence intervals. Most elementary textbooks suggest the following approximate $100(1 - \alpha)$ confidence interval for $p_1 - p_2$:

$$\hat{d} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1 (1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2 (1 - \hat{p}_2)}{n_2}} $$

which directly involves $\hat{V}_U$. However, when testing $p_1 = p_2$, the following pooled estimator is often preferable:

$$ \hat{V}_{P} = \hat{p}(1 - \hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)$$

The pooled estimator is only valid when $p_1 = p_2$, which is why it can't be used while constructing confidence intervals.

The unpooled estimated variance of $\hat{d}$ is slightly more biased

Let $n = n_1 + n_2$. The true variance of $\hat{d} = \hat{p}_1 - \hat{p}_2$ is equal to

$$\begin{aligned}V(\hat{d}) &= Var(\hat{p}_1) + Var(\hat{p}_2) \\ &= \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} \end{aligned}$$

When $p_1 = p_2 = p$, this reduces to

$$V(\hat{d}) = p(1-p)\left(\frac{1}{n_1} + \frac{1}{n_2}\right) $$

The pooled estimator of the variance of $\hat{d}$ has bias equal to

$$E[\hat{V}_P] - V[\hat{p}] = \frac{-p(1 -p)}{n}\left(\frac{1}{n_1} + \frac{1}{n_2} \right)$$

Similarly, the unpooled estimator of the variance has bias equals to

$$E[\hat{V}_U] - V[\hat{p}] = -p(1-p)\left(\frac{1}{n_1^2} + \frac{1}{n_2^2} \right)$$

Both biases go to zero as $n \to \infty$, but because $n = n_1 + n_2$, the bias of $\hat{V}_P$ will always be less than $\hat{V}_U$. Specifically, when $n_1 = n_2$, the pooled estimate of the variance will be half as biased as the unpooled estimator.

It is unclear which test rejects more often

The z-test statistic for the unpooled test will be more extreme than the pooled z-test statistic when

$$\frac{|\hat{d}|}{\sqrt{\hat{V}_P}} = \frac{|\hat{d}|}{\sqrt{\hat{p}(1 - \hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}} < \frac{|\hat{d}|}{ \sqrt{ \frac{ \hat{p}_1 (1 - \hat{p}_1)}{n_1} + \frac{ \hat{p}_2 (1 - \hat{p}_2)}{n_2} } } = \frac{|\hat{d}|}{\sqrt{\hat{V}_U}}$$

or equivalently, when the estimated variance of the unpooled test is smaller than that of the pooled test,

$$ \hat{p}(1 - \hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right) > \frac{ \hat{p}_1 (1 - \hat{p}_1)}{n_1} + \frac{ \hat{p}_2 (1 - \hat{p}_2)}{n_2} $$

It is unclear when this holds. However, for samples of equal size, $n_1 = n_2 = 0.5$, this simplifies to

$$\hat{p}(1 - \hat{p}) > \frac{\hat{p}_1 (1 - \hat{p}_1) + \hat{p}_2 (1 - \hat{p}_2)}{2} $$

Because $f(x) = x(1-x)$ is strictly concave, the previous inequality will be always true. Therefore, for equal sample sizes, the unpooled test always reject as or more often than the pooled test.

Simulations

To simulate the two estimators, I ran 100 millions simulations in R with low sample sizes ($n_1 = n_2 = 20$, $p_1 = p_2 = 0.5$). Under these parameters, the true variance of $\hat{d}$ equals $0.025$.

Parameter Pooled Unpooled
Variance of $\hat{z}$ 1.0255 1.1146
Mean of $\hat{V}$ 0.0244 0.0237
Rejection rate when $\alpha$ = 0.05 0.0425 0.0807

We see that the bias of the unpooled variance is twice as large as the bias of the pooled variance, leading to a $z$ statistic with a higher variance than a standard normal (1.115 vs 1) and that this leads to a higher null rejection rate than specified ($\alpha$ = 0.05).

However, as we increase the sample size, this difference vanishes. If $n_1 = n_2 = 100$, we get a true standard error of 0.005, and the following results:

Parameter Pooled Unpooled
Variance of $\hat{z}$ 1.005 1.020
Mean of $\hat{V}$ 0.004975 0.004950
Rejection rate when $\alpha$ = 0.05 0.056 0.056

The bias of $ \hat{V}_U$ is still twice as high as $\hat{V}_B$, but the bias of either is so low that it does not matter, and both tests have a similar rejection rate.

This goes against the results of @whuber, who claims without justification that the variance of the z-test statistic $\hat{z}_{U}$ is 1/2, while in fact it is slightly above 1.

The pooled and unpooled z-tests are asymptotically equivalent when the null hypothesis is true.

I define $c_{nk} = \frac{n_k}{n}$, $k = 1,2$. The true variance of $\hat{d}$ can be written as

$$\begin{aligned} Var(\hat{d}) &= \frac{1}{n} p(1 - p)\left( \frac{1}{c_{n1}} + \frac{1}{c_{n2}}\right) \end{aligned}$$

If we assume that the two samples grow roughly at the same speed, $c_{nk} \to c_k$, with $0 < c_k < 1$, then from the Central Limit Theorem

$$\frac{\hat{d}}{Var(\hat{d})} = n^{-1/2}\frac{\hat{d}}{\sqrt{ p(1 - p)\left( \frac{1}{c_{n1}} + \frac{1}{c_{n2}} \right)}} \to_d n^{-1/2}\frac{\hat{d}}{\sqrt{ p(1 - p)\left( \frac{1}{c_{1}} + \frac{1}{c_{2}} \right)}} \to_d N(0,1) $$

which justifies the use of the normal distribution.

However since $p$ is unknown, we can't use this to construct the test statistic. Instead, we need to use our sample estimates to get an asymptotically valid estimator.

According to the Strong Law of Large Numbers, we have

$$\begin{aligned} \hat{p}_1 \to_{a.s.} p \\ \hat{p}_2 \to_{a.s.} p \\ \hat{p} = \frac{x_1 + x_2}{n} \to_{a.s.} p \end{aligned}$$

From the Continuous Mapping Theorem, we have

$$\begin{aligned} \hat{p} (1 - \hat{p})\left(\frac{1}{c_{n1}} + \frac{1}{c_{n2}} \right)\to_{a.s.} p(1-p)\left(\frac{1}{c_1} + \frac{1}{c_2} \right) \\ \frac{\hat{p}_1(1 - \hat{p}_1)}{c_{n1}} + \frac{\hat{p}_2(1 - \hat{p}_2)}{c_{n2}} \to_{a.s.} p(1-p)\left(\frac{1}{c_1} + \frac{1}{c_2} \right) \end{aligned}$$

Therefore, from Slutsky's theorem, where can

$$ \begin{aligned} n^{-1/2}\frac{\hat{d}}{\sqrt{ \hat{p}(1 - \hat{p})\left( \frac{1}{c_{n1}} + \frac{1}{c_{n2}} \right)}} &\to_{d} n^{-1/2}\frac{\hat{d}}{\sqrt{ \frac{\hat{p}_1(1 - \hat{p}_1)}{c_{n1}} + \frac{\hat{p}_2(1 - \hat{p}_2)}{c_{n2}}}} \\ &\to_{d} n^{-1/2}\frac{\hat{d}}{\sqrt{ p(1 - p)\left( \frac{1}{c_{n1}} + \frac{1}{c_{n2}} \right)}} \\ &\to_{d} N(0,1) \end{aligned}$$

In order words, asymptotically and when the null hypothesis is true, there is no difference between using the exact variance, the unpooled variance, or the pooled variance.

The power of the pooled and unpooled z-test will be different when the alternative is true

Under the alternative hypothesis that $p_1 \ne p_2$, we have

$$ \hat{p} \to_{a.s} p_1 c_{1} + p_2 c_{2} = \tilde{p} $$

Therefore, we will asymptotically have, for the pooled estimator

$$ \hat{p} (1 - \hat{p}) \left(\frac{1}{c_{n1}} + \frac{1}{c_{n2}}\right) \to_{a.s.} \tilde{p}(1 - \tilde{p})(\frac{1}{c_1} + \frac{1}{c_2}) $$

Meanwhile, the unpooled estimator will keep converging to the actual variance, because

$$ \frac{ \hat{p}_1 (1 - \hat{p}_1)}{c_{n1}} + \frac{ \hat{p}_2 (1 - \hat{p}_2)}{c_{n2}} \to_{a.s.} = \frac{ p_1 (1 - p_1)}{c_{1}} + \frac{ p_2 (1 - p_2)}{c_{2}} $$

The unpooled estimator will be asymptotically more powerful when

$$\frac{1}{\sqrt{\tilde{p}(1 - \tilde{p})(\frac{1}{c_1} + \frac{1}{c_2})}} < \frac{1}{ \sqrt{ \frac{ p_1 (1 - p_1)}{c_{1}} + \frac{ p_2 (1 - p_2)}{c_{2}} } }$$

or equivalently

$$ \tilde{p}(1 - \tilde{p})(\frac{1}{c_1} + \frac{1}{c_2}) > \frac{ p_1 (1 - p_1)}{c_{1}} + \frac{ p_2 (1 - p_2)}{c_{2}} $$

Either estimator could be more powerful, depending on $p_1$, $p_2$, $c_1$ and $c_2$.

However, when the two sample sizes are equal ($c_1 = c_2 = 0.5$), the condition simplifies to

$$ \tilde{p}(1 - \tilde{p}) > \frac{p_1 (1 - p_1) + p_2 (1 - p_2)}{2} $$

Because $f(x) = x(1-x)$ is strictly concave, the previous inequality will be always true. Therefore, for equal sample sizes, the unpooled test is always asymptotically more powerful.

  • The biased variance estimates could be corrected by using Bessel's correction; for example an unbiased estimate of the pooled variance can be obtained by multiplying the biased estimated by $n/(n-1)$. See here for more details. – Guillaume F. Oct 11 '22 at 19:51