0

I am trying to study the influence of a Treatment Model (denoted by 1) vs a control group (denote by 0). The goal of my statistical analysis is to be able to say with a certain confidence that the Treatment model has an influence in sales. In other words, I want to test for the null hypothesis that the Treatment does not have an effect in sales - the difference observed in Revenue from one method to the other is due to random causes, not the Treatment model. I have no a priori-knowledge about the distribution and also have little data to work with.

Hence, I have decided to perform a permutation test. My question is exactly at this point: If I use the median difference (in Revenue) between both model as my test statistic I get a p-value of about 0.15 - which is considerably high but the data is also very noisy in the industry under analysis. However, if I use the mean difference (in Revenue) between both model as my test statistic I get a p-value of about 0.05. This is a considerable difference and have huge impact in the conclusions that I want to take from this experiment. So my question are

  1. What is the best test statistic to use in this case?
  2. Additionally, given the fact that the median and mean have such different results this means that the data is skewed. What to do in such a case?

Please see an example of my sample data below.

Location      MODEL       Revenue
A               0        -200.73
A               1        -300.42
A               0         153.02
B               0          40.23
C               0         300.07
B               0         -599.10
C               0         323.47
D               1         14.37

Many thanks in advance.

  • Are you interested in the mean or the median? It sounds like neither; it sounds like you’re interested in overall distribution differences, not any particular value. – Dave Oct 29 '20 at 11:34
  • @Dave Thanks for your question. Actually, my ultimate goal is in answering the question: does the Treatment have an influence in sales. I think you are right I am more interested in the overall distribution differences. How should I approach it then? I believe looking at the mean may result in outliers having higher influence, so not the best approach I believe. However, it gives me a sound p-value. – Manuel Fernandes Oct 29 '20 at 11:45
  • @ManuelFernandes, it really comes down to what you are more interested in. You can test for differences in overall distribution using the two-sample Kolmogorov-Smirnov test, but it also comes with a cost; that test has less power to detect differences in mean compared to the permutation mean test you have already made. – svendvn Oct 29 '20 at 12:57
  • @svendvn Thank you for your comment and advice. But for e.g., considering that I go with a permutation test, for my particular case where I want to be convinced that the Treatment model will have an impact in sales. What is the most appropriate test-statistic ? On one hand, the median difference is less influence by outliers in the revenue while the mean difference is more affected (hence having a low p-value). What shows me the best picture for my analysis? – Manuel Fernandes Oct 29 '20 at 13:59
  • First use your domain knowledge to consider what you want to test. Then we can get into how best to estimate and test that value. – Dave Oct 29 '20 at 14:03

0 Answers0