2

How do you define "symmetric" here? In a paired test scenario, if one treatment consistently produces better outcome than the other treatment, I'd imagine the differences to be clustered on the "positive" sign, without many "negatives". Wouldn't that mean the distribution of the differences is NOT symmetric (unequal amount of positives and negatives) if the H0 is false?

mp6
  • 23
  • 1
    It would be helpful if you can give the exact language of this assumption, and the citation for where you read it. – Sal Mangiafico Feb 01 '23 at 16:36
  • "Assumption #3: The distribution of the differences between the two related groups (i.e., the distribution of differences between the scores of both groups of the independent variable; for example, the reaction time in a room with "blue lighting" and a room with "red lighting") needs to be symmetrical in shape. If the distribution of differences is symmetrically shaped, you can analyse your study using the Wilcoxon signed-rank test." https://statistics.laerd.com/spss-tutorials/wilcoxon-signed-rank-test-using-spss-statistics.php – mp6 Feb 01 '23 at 16:47
  • 1
  • To your specific question, what the source means is that the differences are symmetric around some value, not necessarily zero. ... But I don't think this is actually an assumption of the test, unless the test is being used specifically as a test of the location (median, mean, etc.). – Sal Mangiafico Feb 01 '23 at 17:04
  • Thank you for the input. What does "symmetric" mean here? With many data points on one side and very few on the other? How symmetric does it need to be? – mp6 Feb 01 '23 at 17:15
  • 1
    Here's a related post on reddit with a truly excellent response I suggest you read. – COOLSerdash Feb 01 '23 at 18:06

1 Answers1

2

"Symmetric" means the population distribution of differences to the left of the population mean and median (assumed to be zero) exactly mirrors the population distribution of differences to the right of the population mean and median.

That said, (1) if the population distribution is symmetric, the null can be rejected if the differences are not centered on zero, (2) if the population distribution is centered on zero, the null can be rejected if the distribution is asymmetric, and (3) if the population distribution of differences is neither centered on zero, nor symmetric, the null can be rejected. How much not centered on zero, or how much asymmetric in order to reject the null relates to power and sample size.

Your case where "one treatment consistently produces better outcome than the other treatment, I'd imagine the differences to be clustered on the 'positive' sign, without many 'negatives'" is one where your sample distribution of differences is not centered on a zero mean and median, and would be evidence favoring rejection of the null. However, your example does not indicate whether the population distribution of differences (or sample distribution of differences, for that matter) is symmetric about its (non-zero) mean and median.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Alexis
  • 29,850
  • 1
    Thank you so much, Alexis, for your very clear explanation. So, the "symmetric" assumption applies to the POPULATION, not to my SAMPLE. That's where I was confused. Thank you. On a separate note, how do you test this "symmetric" assumption? – mp6 Feb 01 '23 at 17:53
  • @mp6 You may not be able to test is at all because it's only relevant under the null, which you probably don't have. – COOLSerdash Feb 01 '23 at 18:00
  • Hi Coolserdash, It is suggested here that it's a simple step on SPSS. "(Assumption 3. ... In practice, checking for this assumption just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task. " I don't quite understand how this asusmption is tested though if it applies to the POPULATION. https://statistics.laerd.com/spss-tutorials/wilcoxon-signed-rank-test-using-spss-statistics.php – mp6 Feb 01 '23 at 18:33
  • 1
    @mp6 , One of the issues here is that the website you cite may not be the best source. The answer here by Alexis, the answer on the page I linked to by Christian Hennig, and some of the discussion on the Reddit site linked to by COOLSerdash are better sources. – Sal Mangiafico Feb 01 '23 at 18:50
  • 1
    Unfortunately, it's not easy to find a list of assumptions for this test that are easy to interpret for most of us non-statisticians. For a source for nonparametric tests, I like Conover (1999), Practical Non-Parametric Statistics. But his list of assumptions for this test are pretty impenetrable. (They are listed here; search for Conover: www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/signrank.htm ) . – Sal Mangiafico Feb 01 '23 at 18:50
  • On the Laerd site, I also wonder about Assumption #1. How can the test accept ordinal data if the first step in conducting the test is subtracting one value from another ? – Sal Mangiafico Feb 01 '23 at 18:54
  • @SalMangiafico TY for the kind words. Ordinal data in the measures implies the possibilities of both ties and zeros (for which corrections exist for the signed rank test), but does not imply ordinal data per se in the difference between measures (it just implies positive and negative integers). – Alexis Feb 01 '23 at 22:44
  • 1
    @mp6 Also: the signed rank test can reject the null solely due to asymmetry, even if the (sample) differences are centered exactly on zero. $\text{H}_{\text{A}}\text{: }$ Either the population distribution of differences is not centered on zero, or the population distribution of differences is asymmetric, or both these things are true. – Alexis Feb 02 '23 at 02:01
  • 1
    @Alexis , I think here you might be giving the page authors a little too much benefit of the doubt. I'll put the beginning of their Assumption 1 in the next comment. It's clear that they are indicating that the test works for ordinal level data. But, using the signed rank test ---since the first step is subtracting the paired observations --- the data would have to be treated as interval in nature. ( Or perhaps some level of data that's between ordinal and interval, where the differences could be ordered even if not numeric per se.) – Sal Mangiafico Feb 02 '23 at 14:48
  • 1
    "Your dependent variable should be measured at the ordinal or continuous level. Examples of ordinal variables include Likert items (e.g., a 7-point item from "strongly agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 5-point item explaining how much a customer liked a product, ranging from "Not very much" to "Yes, a lot")." – Sal Mangiafico Feb 02 '23 at 14:49
  • +1 @SalMangiafico Ah, got it… I did not dive into that description. – Alexis Feb 02 '23 at 16:13