7

Formula for Z statistic in One sample Z test of proportion is:

enter image description here

How the denominator is derived?

  • 1
    I'm not sure I understand the question. Do you mean what's the motivation for this formula? Are you unaware that the denominator is the formula for the standard deviation, or do not not understand why the standard deviation is used as the denominator? – Acccumulation Apr 27 '22 at 02:25
  • The question is where's the proof for this formula in general, and in specific for the standard error for proprtion. – Nathan B Jan 31 '23 at 08:35

2 Answers2

6

When you take a random sample of size $n$ and observe a binary outcome, to conduct this test you first code one of the possible outcomes as $1$ and the other as $0.$ Your model is that the probability of $1$ is some unknown number $p_0.$

Letting the values in the sample be the (random variables) $X_1, X_2, \ldots, X_n,$ the proportion coded $1$ in the sample can be found by summing the $X_i$ (a mathematically simpler operation than counting the ones) because the zeros don't contribute anything. You then divide by the sample size to obtain the proportion:

$$\hat p = \frac{1}{n}\sum_{i=1}^n X_i.$$

Because $\hat p$ is a function of random variables, it, too, is a random variable. The test relies on that and on working out the distribution of $\hat p.$

For an exact test of proportion, we would work out this distribution exactly. The $Z$ test approximates the distribution of $\hat p.$ It uses a Normal distribution. This use is partially justified by the Central Limit Theorem. It is particularly convenient because you can identify any Normal distribution from just two values. The simplest, and most often used, are its mean and variance.

What's nice about the mean and the variance is that they are easy to work out for $\hat p,$ because (in a sense about to be illustrated) both of these quantities add.

A conventional symbol for the mean, or expectation, is $E$. What it means to "add" is that it is a linear operator: namely, the expectation of a linear combination of random variables is the same linear combination of their expectations:

$$E[\hat p] = E\left[\frac{1}{n}\sum_{i=1}^n X_i\right] = \frac{1}{n}\sum_{i=1}^n E[X_i].$$

The expectation of any of the $X_i$ is, as always, the sum of its possible values weighted by their chances of occurring. And since the chance $X_0=0$ must be $1-p_0,$ we find

$$E[X_i] = 0 \times (1-p_0) + 1 \times p_0 = p_0.$$

A conventional symbol for the variance is $\operatorname{Var}.$ It is a quadratic form. This has a somewhat complicated meaning in general, but in the special case where the $X_i$ have been independently sampled it means

$$\operatorname{Var}(\hat p) = \operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n^2}\sum_{i=1}^n \operatorname{Var}(X_i).$$

Notice how $1/n$ was squared: that's the meaning of "quadratic."

One formula for the variance is in terms of squared deviations from the mean, $(X_i - E[X_i])^2:$ it's their expectation. Once again, this is the probability-weighted sum of the values, whence

$$\begin{aligned} \operatorname{Var}(X_i) &= (0 - E[X_i])^2 \times (1-p_0) + (1 - E[X_i])^2 \times p_0 \\ &= (-p_0)^2(1-p_0) + (1 - p_0)^2 p_0\\ &=p_0(1-p_0). \end{aligned}$$

Plugging this into the previous formula shows us

$$\operatorname{Var}(\hat p) = \frac{1}{n^2}\sum_{i=1}^n p_0(1-p_0) = \frac{p_0(1-p_0)}{n}.$$

Notice where the $1/n = (1/n^2)\times n$ came from: it equals the square of $1/n$ (because the variance is a quadratic form) but has been repeated $n$ times for the $n$ independent observations $X_i.$

The upshot is that

The Normal approximation to the distribution of $\hat p$ has a mean of $p_0$ and variance of $p_0(1-p_0)/n.$

This is sufficient to work out the $Z$ test. However, it is convenient to use just a single reference distribution rather than a family of distributions that depend on two numbers (parameters). To this end we always standardize $\hat p.$ This simply means to change how we measure it. Just like converting from degrees F to degrees C, we shift its origin (its zero value) to have an expectation of zero and rescale it to have unit variance. Using the same algebraic rules as before -- expectations add and variance is a quadratic form -- we finally deduce that the distribution of

$$Z = \frac{\hat p - E[\hat p]}{\sqrt{\operatorname{Var}(\hat p)}} = \frac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}$$

is approximately that of the standard Normal distribution with mean $0$ and unit variance.

Whenever you see this formula, or one like it, take a moment to recall where each of its terms comes from: it standardizes a test statistic by shifting its mean and rescaling it to have unit variance. This will help you recall such formulas, use them correctly, and understand other statistical formulas as well.

whuber
  • 322,774
  • I see the proof for var(p), but can you add the proof for the last equation, why we do (p-E[p])/var(p)^0.5 – Nathan B Jan 31 '23 at 08:43
  • 1
    @Nathan That's a definition. $Z$ can be seen as changing the units of measurement of the variable $\hat p$ by (a) shifting its origin (zero value) to $E[\hat p]$ and (b) rescaling it to have a unit variance. For a nuanced, rigorous, and non-mathematical account of what this accomplishes see Freedman, Pisani, & Purves textbook Statistics. – whuber Jan 31 '23 at 14:34
3

The denominator is the standard deviation of the sampling distribution, which is known as the standard error. Standard error, at least under the assumptions of the z-test, is equal to the population standard deviation divided by the square root of the sample size.

For the proportion variable you are considering, calculus shows the variance to be $p_0(1-p_0)$. Therefore, the standard deviation is $\sqrt{p_0(1-p_0)}$. Then we divide by the square root of the sample size to get $\dfrac{\sqrt{p_0(1-p_0)}}{\sqrt{n}} = \sqrt{\dfrac{p_0(1-p_0)}{n}}$.

Dave
  • 62,186