What is the best method to test the null hypothesis on volatile data?

Question

I am trying to find a good method to test the null hypothesis(H0) on two unpaired samples. Those samples come from two different HTTP Servers and the unit I'm using is req/30s (requests concluded in 30 seconds).

Even a well-established server is expected to have a standard deviation of 50 requests. I have tried to use the Student's t-test to validate the null hypothesis, however, the following dataset it shows a p-value lower than the threshold of 0.05

A1 = [
  4670, 4646, 4612, 4618, 4646,
  4609, 4623, 4629, 4566, 4628,
  4582, 4636, 4621, 4574, 4624,
  4563, 4651, 4642, 4586, 4621,
  4606, 4628, 4575, 4631, 4646,
  4600, 4594, 4661, 4568, 4611
]
B1 = [
  4630, 4655, 4652, 4633, 4637,
  4661, 4625, 4680, 4647, 4639,
  4633, 4661, 4638, 4621, 4630,
  4682, 4703, 4665, 4652, 4648,
  4673, 4651, 4669, 4646, 4612,
  4654, 4651, 4619, 4637, 4620
]
st.ttest_ind(A1, B1)
Ttest_indResult(statistic=-4.855056212284194, pvalue=9.47100493260572e-06)

In my previous question Student's t-test on "high" magnitude numbers, I could understand that the data are clustered a low variance means statically significant. However, for my use case, this variance shouldn't reject the null hypothesis. I thought about comparing the means between two samples and if the difference was less than X, the hypothesis would be true. However, since the data is volatile I can't trust the mean, because sometimes, I can get results in a large range (4000 ~ 10000) for instance.

What methods the recommended to test if those two samples are statically significant considering those limitations?

If you are concerned about the mean being sensitive to outliers, you might consider if the median is more appropriate for your task. // Some argue in favor of considering the mean because of its sensitivity. — Dave, Nov 07 '22 at 00:39

score 3 · Accepted Answer · answered Nov 07 '22 at 01:39

Responses to your previous question†, address a hypothesis test on the means for this sample data.

It sounds like you are trying to make a hypothesis test mean something other than than what does.

Or that you know what answer you want, and are trying to find a statistical "test" that will confirm this intuition.

A few notes on your question:

"However, for my use case, this variance shouldn't reject the null hypothesis". Honestly, if you have already decided what conclusion you want to reach, you don't need to conduct any test. Especially in the case of two-sample data, your intuition as to what conclusion you should draw may be more preceptive than a test of statistical significance.

"I thought about comparing the means between two samples and if the difference was less than X, the hypothesis would be true". If this is what you want, you can do this. It doesn't require any special statistical knowledge; it would be just what you said: "The threshold for some conclusion is if mean of A and mean of B are < X or not."

"However, since the data is volatile I can't trust the mean". If this is a concern, you are free to use a trimmed mean or perhaps the median or maybe the geometric mean, or some other statistic of central tendency.

† Really, essentially duplicate question.

For this specific use case, I do have my conclusion, but I'm still testing it because my intention is to include that approach in an automated tool. Hence, what I'm doing is validating if the p-value is reliable from my business perspective or as you wrote, I'm trying to find a statical test that will confirm this intuition.
I'm sorry for the duplicated question, I was instructed to create a new question if my concern differs from the original question. — Rafael, Nov 07 '22 at 12:04
@Rafael , no worries. One thing is that hypothesis tests mean less than we often imply that they do. I mean, a t-test determines if there's a detectable difference in means relative to the variability in the data. If the same size is large enough, and there is some real difference in the means, then the t-test should return a significant result. Whether this difference in means is of any practical importance is an entirely different question. — Sal Mangiafico, Nov 08 '22 at 20:42

What is the best method to test the null hypothesis on volatile data?

Ttest_indResult(statistic=-4.855056212284194, pvalue=9.47100493260572e-06)

1 Answers1