2

My Data: Respondents were asked to evaluate the quality of two products on a scale of 0-10. There were 12 criteria that constituted the grading scheme, and I would like to analyze the scores overall (as an average of the scores across all the criteria) and per criterion.

Each respondent assessed both products. There were 25 respondents in total, meaning 25 scores for each of the two products.

Objective: To see if one product is better than the other.

Possibilities: The statistical test should be one that finds group differences while accounting for paired data since the scores for each product are linked by the respondent (because the respondents in both cases are the same). My data is definitely non-normal.

I initially considered Wilcox signed rank test, although I could not derive p-values (which my collaborators would prefer) due to ties.

I then considered permutation-based and, after some reading, I am also wondering whether K-S test might be appropriate for my goals.

I have experimented using both the median and the mean difference as the test statistic for my permutation test, and whilst most of the p-values are comparable between the two approaches (I guess around two-thirds), some are vastly different (changing from between 0.1 - 0.5 in size).

My objective is simply to know if one product is better than the other. Since all of these approaches compared group means but in different ways, I am not sure which would be preferred in my circumstances. I think presenting the results of multiple tests gives a fairer all-round picture, but in order to do that I will also have to understand how the differences in the conclusions of the tests arise, i.e., what causes them.

Questions

  1. Does one test - either one I've mentioned or some other one - make more sense than the others in my circumstances?

  2. If I opt for the permutation test, does either the median or the mean make more sense in my scenario?

  3. I will probably end up presenting both and either use one as the main statistic with the other as a sensitivity analysis or simply present both and discuss the results of both statistics combined. In this case, though, I have to make sure I can describe the results adequately. Hence:

a) If there are large differences between the median and the mean when I use the permutation test, how can I understand what these differences tell me?

b) I know that it's evidence that the median and mean are at different locations but what is the deeper meaning of this in relation to my objectives and how does it influence how I transmit the results?

Derek
  • 21
  • Thanks for your input @SextusEmpiricus. We are definitely at the analysis phase, I am just a little unsure about which test statistic would make most sense to use. I think that if the median and mean give different p-values for a given observation, then the information they transmit is different, and I'd like to be able to interpret what that means for our results – Derek Jun 05 '23 at 09:54
  • Thanks for your suggestions and time @SextusEmpiricus. I completely agree about letting my "eyes decide" in this scenario, but unfortunately, my collaborators (and probably the reviewers of the paper) will not stand for this. They feel it is too subjective, non-scientific, and expect p-values – Derek Jun 05 '23 at 09:58
  • The use of p-values might seem scientific, but it would be abuse of science. See also https://en.m.wikipedia.org/wiki/How_to_Lie_with_Statistics Anyway, even if you would like to continue this path, then it is better to ask a more specific question once you know what sort of pattern/hypothesis you like to test. Performing multiple tests untill something succeeds is not an approach that can be easily answered how to do it. – Sextus Empiricus Jun 05 '23 at 10:02
  • I am familiar with the contents of the book! :) Maybe I didn't put enough emphasis on this in my question but part of the problem is indeed that I don't know what type of pattern/hypothesis I would like to test beyond "is there a difference between the products". Since this can be assessed in multiple ways (mean, median, overall distribution, visually, etc) I thought I would post a question. – Derek Jun 05 '23 at 10:06

1 Answers1

2

If there are large differences between the median and the mean when I use the permutation test, how can I understand what these differences tell me?

This sounds like explorative research.

If you are still in that phase, then it is of little use to derive p-values. You will be looking at too many patterns and approaches: difference in mean, difference in median, difference in stochastic dominance (rank test), with or without some data transformation, eliminating some outliers or not, etc.

So it would be best to plot these data, look at the tables, perform potentially some additional analysis like PCA, LDA and clustering.

Then, to decide on significance you should let your eyes decide whether any observed effect is strong enough. The reason is because any computation would give a false sense of computing some rigorous probability while the underlying approach and numbers are not rigorous.

Thus, the test to use is the interocular trauma test, discussed here: Source for inter-ocular trauma test for significance

If you need a more quantitative result, then you need to gather more additional data. It is bad news, but I guess that it is better than the suggestion to fool your collabor6 with a calculation that seems accurate but makes no sense.

  • Thanks Sextus. Something I would like clarification on is why would a permutation test here "give a false sense of computing some rigorous probability while the underlying approach and numbers are not rigorous". My understanding tells me this is a valid approach, the only hitch in my scenario is that the p-vals of the mean and median differ for some observations. I don't think this invalidates the approaches, they just tell me different information. If I knew what the differences between the information they convey were, then I would not have a problem I think. Am I wrong? – Derek Jun 05 '23 at 10:04
  • 1
    @Derek the problem may not need to be the test, but the problem is the test procedure. If you don't know in advance what you are looking for and if you don't even have an idea about the potential meaning of the differences that you might find, then your test will be biased because you are looking at many different hypotheses (like the multiple comparisons problem). – Sextus Empiricus Jun 05 '23 at 10:08
  • I considered the problem of multiple comparisons but decided I didn't think it applied here. Assuming a null hypothesis of no difference between the groups, then multiple comparisons would mean each group has an equal probability of having higher scores or lower scores just by chance, meaning the problem of multiple comparisons cancels out – Derek Jun 05 '23 at 10:13
  • 1
    @Derek "Assuming a null hypothesis of no difference between the groups" You have a single hypothesis of no difference, but you can measure difference in many different ways. See the example at the bottom of the answer here: https://stats.stackexchange.com/a/470512/ . By allowing yourself to test multiple potential ways how the samples can have a difference, you will have a larger probability to find a difference with some specific computed p-value below a particular level. – Sextus Empiricus Jun 05 '23 at 11:44