2

I have the price per bottle Preis Normal of a certain drink, the difference is the number in each packaging. the Magnum is double the size of the normal bottle, so Preis Magnum / 2 should give the same value as Preis Normal, which isn't the case basically. Each row represents a certain point in time, where the prices were gathered.

       Preis Normal` `Preis Magnum / 2` `Preis Magnum`
           <dbl>              <dbl>          <dbl>
1           372.               470            940 
2           857.              1109.          2218.
3           661.               864.          1728.
4           813.               813.          1627.

First I boxplotted the the two columns, to get a first graphical overview.

enter image description here

They differ substantially.

Using the R-command Wilcox-Test the following way wilcox.test(PichonMag$`Preis Normal`, PichonMag$`Preis Magnum / 2`, paired = FALSE, exact = TRUE, alternative = "less") gives me the following output:

Wilcoxon rank sum test

data: PichonMag$Preis Normal and PichonMag$Preis Magnum / 2 W = 5, p-value = 0.2429 alternative hypothesis: true location shift is less than 0

With this p-value I can't reject H0, which means, that prices don't differ.

I still don't get why, as they differ substantially (at least with my real life experience). Did I make a mistake here?

MaxT
  • 25
  • 3
    A sample of $4$ of each bottle is not many especially if one pair is tied – Henry May 28 '21 at 22:27
  • 3
    Non-significant doesn't mean that there is no difference, only that you could get data like yours if there were no difference in the broader population. I.e. that you could get data like yours just by "luck of the draw". The total number of times any of the magnum prices gets beaten by one of the normal prices is 5, and you could easily get data like that just by luck of the draw: it is within the 76% most likely things that could happen by luck of the draw, hence a p-value of 24%. – Phil Assheton May 28 '21 at 22:31
  • Thank you @PhilAssheton. I totally understand, what you are saying. The issue is, that I conduct the Wilcox-test to find out, if is legit to include the data of the magnum bottles in my dataset in general. If it will bias my overall results. Would you include the Magnums even, with the p-value of 0,24? – MaxT May 28 '21 at 22:36
  • (If you do indeed just have the 2 lots of 4 datapoints shown there, the only sort of dataset you could not fairly easily get by luck would be if all the datapoints from one group beat all from the other) – Phil Assheton May 28 '21 at 22:37
  • What is the dataset in general? Is it just the other four datapoints? And what is its goal? What sort of analysis do you hope to run? You might be able to add magnum as a controlling variable, perhaps if you ran it as a regression-type thing. There just isn't enough data here to tell if they are from the same distribution or not. – Phil Assheton May 28 '21 at 22:38
  • The dataset is larger. This subset just shows points in time, where prices for normal bottles and magnums are found at the same time. Before and after there are way more datapoints (for normal bottles and a few for magnums).

    I didn't see another method, to see if the price of the normal bottles & the magnums differ significantly.

    The intention I'm checking for significant differences in prices, is if including the prices for magnums in the sample is meaningful or if the should be dropped.

    – MaxT May 28 '21 at 22:42
  • 1
    It's hard to answer the question of whether you can include the extra data without understanding the data better, and what your end goal is (e.g., as I say, if your final analysis would be a regression it might be possible to use a controlling variable to pull out differences between them -- might being a big word). For sure, testing four datapoints for significance will tell you almost nothing. – Phil Assheton May 28 '21 at 22:51
  • The regression model will be based on Regression Discontinuity in time. Where the discontinuity is based on a rating at certain point in time. Due to this reason I want to make sure, that the prices of the magnums don't bias the outcome heavily. – MaxT May 28 '21 at 22:59
  • Ugh I'm sorry, I don't think I can help you. I'd need to get really involved with exactly what you are doing and understanding your data, your problem and your analysis, to get a feel for whether I would include these data or not. Unfortunately, though, I don't think the significance test is going to help you. – Phil Assheton May 28 '21 at 23:18
  • With four pairs of values, the smallest two-sided p-value a Wilcoxon test can report is 0.125. That is the p-value when all four differences go in the same direction. – Harvey Motulsky May 28 '21 at 23:28
  • Of course I know, that the more datapoints in this case, the better it would be. @HarveyMotulsky, is there a "minimum" number of pairs, which you would recommend? – MaxT May 28 '21 at 23:34
  • 1
    The best graphical overview is to plot the raw data. The medians here are the average of the 2nd and 3rd ranked points in each set. In showing 5 levels from 4 data points the box plot is doing mechanically what it is supposed to do, but it can't be more informative or easier to think about than the raw data. – Nick Cox May 29 '21 at 06:50
  • Using a statistical test to guide the rest of the analysis is not very good practice. – Frank Harrell May 29 '21 at 12:13
  • Why wouldn't you use a paired test? The observations were made in a matched manner, so a test that uses that matching seems appropriate. Have I missed something? – David Smith Jun 01 '21 at 11:41
  • The sample size is somewhat small. All the paired differences are in the same direction though one is near zero. A sign test gives a one-sided p of 1/16 and a two-sided p of 1/8. For five pairs, all with a difference in the same direction, the corresponding p values are 1/32 and 1/16. In other words, 4 pairs of observations seems too small, 5 marginal, and 6 minimal for this test. I did not compute a Wilcoxon signed rank test, but the sample sizes are small for it also, since it depends on the permutations. – David Smith Jun 01 '21 at 11:49

1 Answers1

5

The 2-sample Wilcoxon Rank Sum considers the ranks of the two samples. In a two-sided test, it is (just barely) possible to get a significant result at the 5% level with two groups of four, but only if all of the observations in one group are below any of the observations in the other. Then the P-value of the test is $2/{8\choose 4} = 2/70 < 0.05 = 0.02857 < 5\%,$ as below.

wilcox.test(1:4, 5:8)
    Wilcoxon rank sum test

data: 1:4 and 5:8 W = 0, p-value = 0.02857 alternative hypothesis: true location shift is not equal to 0

Otherwise, with some overlap between the two groups (or with fewer than 4 observations in both groups), a P-value below 5% is not possible. Your boxplots show overlapping data.

Only the ranks matter. With two samples of size four, the 2-sided, 2-sample Wilcoxon test will not give a P-value below 5% if there is any overlap at all, not even if some of the values in one group are hugely larger than values in the other group.

wilcox.test(c(1,2,3,5), c(4, 90, 200, 1000))
    Wilcoxon rank sum test

data: c(1, 2, 3, 5) and c(4, 90, 200, 1000) W = 1, p-value = 0.05714 alternative hypothesis: true location shift is not equal to 0

BruceET
  • 56,185