T-test fails as sample size increases. Is there a solution?

Question

Please consider the Table below

Item	Sample Mean	Sample count, n	Sample Standard Deviation	Standard Error SD/sqrt(n)
Unit A	95.9461 PSI	n=430	3.8397	0.185166776
Unit B	94.488 PSI	n=25	2.5344	0.50688

Now when I want to show if these two tests are significantly different from each other.

$$ \frac{94.488 - 95.9461}{ \sqrt{0.50688^2+0.185166776^2}} \approx -2.7 $$

Now according to this Website these two units are significantly different!? Even if I have the true standard deviation (in this case its probably similar). I would still get similar results. Dividing by the sample count completely destroys any kind of SE, as you can see: as n approaches infinity, SE approaches 0. You could make any two averages significantly different from each other with a large enough sample size, simply because you are dividing by a smaller and smaller number. Intuitively, I don' think there is any difference.

Suggestion 1: I don't think this addresses the problem I am getting at. When doing a Z-test, we would be using standard deviation, but with a t-test, or in this case, we divide by n. This doesn't make sense to me, we should inherently have a larger SE with a sample than compared to a population because we don't have as many data points.

Edit: for those asking for the formula from the website.

If the Z-statistic is less than 2, the two samples are the same.
If the Z-statistic is between 2.0 and 2.5, the two samples are marginally different
If the Z-statistic is between 2.5 and 3.0, the two samples are significantly different
If the Z-statistic is more then 3.0, the two samples are highly signficantly different

$$ Z = \frac{\bar{X_1}-\bar{X_2}}{\sqrt{\sigma^{2}_{x_1}+\sigma^{2}_{x_2}}} $$

Edit: To further clarify, below is a histogram of my tests. How can we possibly look at these two distributions and say that they are significantly different? Especially when, I only have 25 samples for one!

What formula do you use to calculate the z-statistic for the z-statistic you mention in your edit? — Dave, Jun 01 '23 at 15:43
You divide by $n$ in either case. Which do you think is more accurate, i.e., has a smaller standard deviation - the sample mean based on 2 observations or the sample mean based on 2 million observations? — jbowman, Jun 01 '23 at 15:45
"You could make any two averages significantly different from each other with a large enough sample size, simply because you are dividing by a smaller and smaller number." That's not a flaw! It's exactly the point of a hypothesis test. A hyp. test is not really answering "Are these two means different?" but rather "Did we collect enough data to measure these two means precisely enough to be confident about which one is larger?" — civilstat, Jun 01 '23 at 15:46
I am using this website:. I wanted to show that the 2 averages for my test were basically the same. But according to this,, they are significantly different. http://homework.uoregon.edu/pub/class/es202/ztest.html — ninjaboy667, Jun 01 '23 at 15:46
"$\sigma_{x_1}$ is the standard deviation of sample one divided by the square root of the number of data points" "$\sigma_{x_2}$ is the standard deviation of sample two divided by the square root of the number of data points" These two quotes seem to say that a z-test also divides by the sample size. — Dave, Jun 01 '23 at 15:48
In the example, they divide the dispersion by 5 (because n = 25). I don't have the population standard deviation, That's why I divide by sqrt(n) to get SE. simliar to example. http://homework.uoregon.edu/pub/class/es202/ztest.html — ninjaboy667, Jun 01 '23 at 15:52

score 5 · Accepted Answer · answered Jun 01 '23 at 16:23

5

The description of approach is crude to say the least, but the inference is correct - those two samples have a statistically significant mean difference. A histogram reveals very little here because you are dealing with over 430 observations, binning at intervals of 5 PSI when the mean difference is 1.5 PSI.

Note: the title says T test, but you are performing a Z test here. There are separate camps when it comes to qualifying statistical significance with descriptors like "marginally" or "highly". I'm in the don't-do-it-camp.

Industrial statistics often seeks very, very high levels of precision. As anyone who has worked with high pressure equipment can testify, a mean difference of 1.5 PSI can have far reaching consequences in terms of the safety and reliability of equipment. The range of the samples seems to be a completely separate issue.

answered Jun 01 '23 at 16:23

AdamO

62,637

1

The binning intervals are 2 PSI, and I agree a 1.5 PSI difference could be a significant difference, but in my line of work, anything under ~5psi wouldn't be considered significantly different. One of the issues is that there might be outliers in the data, as you point out that the sample ranges are very different--Which is partly why I believe that the means should really be of negligible difference. – ninjaboy667 Jun 01 '23 at 16:46
@ninjaboy667 It sounds like you could be interested in an equivalence test. – COOLSerdash Jun 01 '23 at 16:50
3

@ninjaboy667 what seems to be troubling you is the difference between statistical and practical significance. With a large enough sample size you can eventually detect a "statistically significant" difference of truly small magnitude, as your question notes. How much that difference matters in practice must be based on subject-matter knowledge, not statistical inference alone. – EdM Jun 01 '23 at 16:53
@COOLSerdash. That's a very good point. I will look into equivalence tests-- I have not heard of that before. I guess I thought I could use a t-test and/or a z-test to check for equivalence, when in reality. Most of my data will have large enough sample sizes that I am always going to get marginal differences in the means (that are statistically significant) – ninjaboy667 Jun 01 '23 at 17:27
@eDM, yes I realize the mistake. I will basically always detect a "statistically significant" difference, however small, in a lot of the stuff I am testing because I am always going to have large samples for my line of work. – ninjaboy667 Jun 01 '23 at 17:30
1

Maybe more useful than a test would be a confidence interval for the difference. If you have large datasets and a tiny difference in means, you may get a CI that's quite narrow around a value that is negligibly small. This way you can say with high confidence that the difference in means is not practically significant, which seems more relevant to you than whether it's statistically significant. – civilstat Jun 01 '23 at 19:01
@ninjaboy667 if you just want a more conservative way to declare "findings" due to your analyses, you should be using a higher alpha level. The suggestion to change a hypothesis because of a finding is ludicrous but in either case you should always carry out the prespecified analysis regardless of the results. – AdamO Jun 01 '23 at 19:20

T-test fails as sample size increases. Is there a solution?

1 Answers1