0

I have two groups of discrete data (integers):

  Group 1   Group 2
      101       103
      105       200
      115       150
       98       160
      100       115
      ...       ...

and I need to know if they are significantly different or not. For this kind of tests I know there are useful tests such as t-test or Wilcoxon test. However, I was told that this kind of statistically tests are for continuous data (not my case).

Then, I used a Chi-squared test. However, assumptions are not met since there are a lot of cells with 0s. Even combining my data into bins (e. g. 1-19, 20-39, etc.), I have lots of cells with 0s. R throws this warning in this case:

Chi-squared approximation may be incorrect

I know as well there are the Montecarlo simulation. However, it is just a simulation and is always giving the same p-value, exactly the same p-value, for all my different datasets to be compared. I don't like this idea.

Fisher test is practically impossible due to the size of my datasets. It is possible to use Fisher test if I group my data into bins of 1000, quite wide bins. However, I don't like this idea neither.

In summary, do you know how can I deal with my data?

Notes:

  • My data is not paired.
  • Group 1 has about 30.000 observations while group 2 hardly has more than 4.000.
  • Extremely skewed data. Example with two of my datasets:

Distributions of two of my datasets

mdewey
  • 17,806
  • Are you data paired? – Alexis Jul 23 '14 at 16:19
  • No, my data is not paired. – user2886545 Jul 23 '14 at 16:20
  • Are the integers counts? 2) you can have zero observed counts in cells with a chi-square. The issue is with low expected counts. 3) That you're getting 'exactly the same p-value for all your data sets' -- there's not enough detail here to tell what the issue is - what was this p-value and how were these simulations performed? It wouldn't happen to be that you used simulate.p.value=TRUE in chisq.test in R, with default B and got p= 0.0004998 would it? This is simply 1/(B+1). That's to be expected if there's a strong effect or large n
  • – Glen_b Jul 23 '14 at 18:51