2

I'm looking into comparing charge/cost (economics) data among paired samples (i.e pre vs post). The sample size is about ~150 paired samples, where charge/cost is highly skewed with a long tail.

I'm concerned about using a paired t-test as it violates the assumption of normality, and I've been looking into using the coin package in r for its implementation of the fisher-pitman permutation test. Upon doing some research, I've also read about possibly doing a bootstrapped hypothesis test? Would a wilcoxon signed rank test be appropriate in this case? What would be most appropriate in this situation?

Michael Luu
  • 453
  • 1
  • 3
  • 9

2 Answers2

1

Wilcoxon is indicated for what you describe.

[Edit] as @greenparker said this needs some explaining.

t-test assumes equal standard deviation and that a fitted normal describes the data.

If those assumptions are not suitable, or at least, if a plot shows that can be a very bad assumption, then use a method that does not make the assumptions or transform the data (e.g. applying logs)

Wilcoxon is the distribution-free version of t-test for paired data.

  • According to Wikipedia in German and in Catalan, Wilcoxon test assumes symmetrical distribution. Answers to http://stats.stackexchange.com/questions/14434/appropriateness-of-wilcoxon-signed-rank-test give somehow conflicting statements about it. Therefore, is skewness a problem for Wilcoxon text? If it were, it would be less suitable for this question. – Pere Jul 29 '16 at 12:19
  • I agree with commenting and discussing details but it seems that Wikipedia is not a right source. My stats professor edited Wikipedia adding intentional errors to make us thinking about what can be a "correct" or a "bad" solution. Please check this from Northwestern University http://www.basic.northwestern.edu/statguidefiles/srank_paired_ass_viol.html – pachadotdev Jul 29 '16 at 20:41
0

If you just want to compare the means, the t-test can be used since it is robust to departures of normality. See my previous answer: Hypothesis testing options on non-normal populations

Alternatively, you can compare them using nonparametric approaches such as the one proposed here, where the authors calculate $P(X<Y)$ for paired observations in order to assess discrepancies between $X$ and $Y$. Alternative parametric approaches using skewed dependent distributions have also been studied here

Dorian
  • 23