0

I have read that you should use a Z-Test when you have a sample size $ n > 30$, because this is where your sample distribution becomes normally distributed.

The Z-Test equation has more power when you increase the sample size:

$$Z = \frac{\bar X - \mu_0}{\frac{\sigma}{\sqrt{n}}}$$

where $\bar X$ is the sample mean, $\mu_0$ is the population mean, $\sigma$ is the standard deviation of the population, and $n$ is the sample size.

Do you still need to worry about your sample size if you have a good estimate of $\mu_0$ and $\sigma$? Or can you replace the $\frac{\sigma}{\sqrt{n}}$ term with $\sigma$?

Connor
  • 625
  • 1
    If the sample is normal and the variance is known, the z-test has exact normal distribution, for any sample size. If the variance $\sigma^2$ is unknown, and it's estimated by the usual sample variance $s^2$, then the z-test is the t-test, and has exact t-student distribution. If the sample is not normal, then you can appeal to the CLT, to conclude that the larger $n$ the better closer were are to the normal distribution. – utobi Mar 30 '23 at 08:15
  • What do you mean by, "if the sample is normal". If I have a sample size of 1, what does saying "the sample is normal" mean? – Connor Mar 30 '23 at 09:18
  • 1
    'If the sample is normal' means "Assume $X_1,\ldots,X_n$ such that $X_i\sim N(\mu, \sigma^2)$, for all $i=1,\ldots, n$". – utobi Mar 30 '23 at 09:25
  • 4
    Some errors and misunderstandings here. "$\sigma$ is the standard deviation of the sampling distribution", no it is the standard deviation of the underlying distribution; the sd of the sampling distribution is $\sigma/\sqrt{n}$. – Christian Hennig Mar 30 '23 at 09:50
  • $\mu_0$ is the hypothesised mean to test, which is not necessarily the population mean (if you knew it is, you wouldn't need a test). – Christian Hennig Mar 30 '23 at 09:51
  • 2
    If $\sigma$ is not given, you can never use the Z-test in its original form, as this assumes $\sigma$ to be known. If you don't know it (which is the case in 99% of applications) you will anyway have to estimate the $\sigma$, usually by the square root of the sample variance. However, for $n\to\infty$, due to the Central Limit Theorem, this will be equivalent to running the Z-test as if the estimated sd is the true one. It is equivalent for $n\to\infty$ also for non-normal data to running a t-test, which is arguably in most cases better than running the Z-test. – Christian Hennig Mar 30 '23 at 09:55
  • 3
    "I have read that you should use a Z-Test when you have a sample size n>30, because this is the point at which your sample becomes normally distributed." It is not the "sample" that becomes normally distributed, but only the sampling distribution of the sample mean, and this never becomes precisely normally distributed, only approximately, unless the underlying distribution is exactly normal. – Christian Hennig Mar 30 '23 at 09:57
  • @ChristianHennig Is this a notational issue or a concept issue? Does using $\sigma$ without a subscript always mean I'm using the population standard deviation? What is the notation for the standard deviation of the sampling distribution if it's not $\sigma$? – Connor Mar 30 '23 at 09:58
  • 1
    @Connor Well, in general all notation should be explicitly defined, and the same name does not necessarily always mean the same thing unless said explicitly. Howecer, not only is $\sigma$ a standard choice as notation for the population sd, also it is the population sd in your definition of the Z-statistic, so I was basically referring to your use of notation. – Christian Hennig Mar 30 '23 at 10:01
  • @ChristianHennig What is the sampling distribution then? The first result on google defines it as: "A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population." Surely that means if $X$ is my sample that it has a normal distribution? – Connor Mar 30 '23 at 10:01
  • 1
    @Connor The distribution of $X$ is the population distribution. The distribution of the statistic you compute from the sample, say $\bar X$ here, is the sampling distribution. – Christian Hennig Mar 30 '23 at 10:03
  • @ChristianHennig Okay, great! Thank you. So the question then is, if I sample the population distribution 1,000 times and that gives me some mean and standard deviation, can I use that to test a single sample and check how likely it is to be part of the population distribution? i.e. can I perform a Z-Test on the single new sample using my approximation of $\mu$ and $\sigma$? – Connor Mar 30 '23 at 10:05
  • 3
    "I have read that you should use a Z-Test when you have a sample size n>30, because this is the point at which your sample becomes normally distributed." There is no fixed point at which the sampling distribution of the Z-statistic becomes normal for non-normal data. The larger $n$, the closer to a normal it becomes (under the assumptions of the CLT, which are not always fulfilled). $n>30$ is a standard "rule of thumb" to give people some orientation, but in reality the approximation may not yet be good at $n=30$ or may be good already earlier, depending on the underlying distribution. – Christian Hennig Mar 30 '23 at 10:07
  • 1
    @Connor Please change your question if you want to change it, or ask a new one if you want to ask a new one. Comments are not for addressing your question but rather for highlighting issues with the question, which is what I'm doing. Note also that I won't go to chat (in case the system tells you you should because we're writing too many comments). – Christian Hennig Mar 30 '23 at 10:09
  • @ChristianHennig So $\sigma_0$ is incorrect and I should use $\sigma$ for the population sd as this is presumed to be known in the normal Z-Test. I've changed this, and although you won't answer, can you tell me if my question makes sense now? – Connor Mar 30 '23 at 10:18
  • @ChristianHennig Does this question make more sense: https://stats.stackexchange.com/questions/611236/how-can-i-check-a-value-y-i-is-drawn-from-a-population-distribution-using-ba – Connor Mar 30 '23 at 10:32
  • 1
    @Connor Original question seems OK now. – Christian Hennig Mar 30 '23 at 11:29
  • @ChristianHennig If the question is now okay, what's preventing someone from answering it? Not meaning to be rude, genuinely curious about how question answering works on this site. It doesn't seem to happen very much! (I've read this meta: https://stats.meta.stackexchange.com/questions/2083/unanswered-questions-as-percentage-of-total-why-does-cv-stand-out, so I get there are reasons. But what can I do about it?) – Connor Mar 30 '23 at 21:09
  • 1
    @Connor "Do you still need to worry about your sample size" - "worry" is a very subjective term and I don't really know what it means to say yes or no to this. "Can you replace the $\sigma/\sqrt{n}$ by $\sigma$?" that's a weird idea; this would make $Z$ by a factor of $\sqrt{n}$ larger so it could work but then you'd have to multiply the distribution of the test statistic by $\sqrt{n}$ as well and it'd be equivalent, so why would anybody think that's an improvement? – Christian Hennig Mar 30 '23 at 22:28
  • Similar questions (maybe a duplicate?) https://stats.stackexchange.com/questions/85804/choosing-between-z-test-and-t-test https://stats.stackexchange.com/questions/597162/can-we-use-z-test-when-the-population-standard-deviation-is-known-but-the-sample https://stats.stackexchange.com/questions/430758/under-what-conditions-should-i-use-an-approximate-z-score-vs-a-t-test https://stats.stackexchange.com/questions/590023/why-is-t-test-more-appropriate-than-z-test-for-non-normal-data – kjetil b halvorsen Nov 14 '23 at 13:35

0 Answers0