Assume I aggregated the following results:
| Treatment | Count | Total | Proportion |
|-----------|-------|-------|------------|
| Control | $c_n$ | $c_t$ | $p_c$ |
| Variant | $v_n$ | $v_t$ | $p_v$ |
It seems like I can compute the $z$-score in one of two methods:
Option 1 / Pooled (link): $z$-statistic is given by $$z = \frac{p_c - p_v}{\sqrt{\frac{\hat{p}(1-\hat{p})}{c_t} + \frac{\hat{p}(1-\hat{p})}{v_t} }} $$ where $p_c$ is the proportion of the control group and $p_v$ is the one of the variant group. Furthermore, $$\hat{p} = \frac{c_t p_c + v_t p_v}{c_t + v_t}$$ is the pooled proportion where $c_t$ and $v_t$ are the sample sizes of the control and variant groups, respectively.
Option 2 / Unpooled (link): Here the $z$-score is given by:
$$z = \frac{p_c - p_v}{\sqrt{SE_c^2 + SE_v^2}}$$
where
$$ SE_c = \sqrt{\frac{p_c(1-p_c)}{c_t}} \quad ; \quad SE_v = \sqrt{\frac{p_v(1-p_v)}{v_t}} $$
My questions are:
- Under what circumstances should I use the one or the other? My understanding tells me it depends on my knowing/assumption on the variance. If I assume it is the same or not big of a difference, then I should use the pooled and the unpooled otherwise.
- When would the difference between the two be significant?
- Is there an implemented test in Python where I can choose which approach to use? In
statsmodels.stats.proportionit seems like only the pooled option is available.
Edit:
- Related questions:
- Also found an interesting video; however, it deals with the $t$-test. If I understand correctly I can port its message to the case I consider.