2

I've ran the Chi-Squared test for the following crosstable and got some weird results. Is there anyone who can help me work out the problem?

            Category
  AGE_Category Very Low  Low Medium High Very High
       (0,1]     9763 24097   7113 11431      2389
       (1,2]    14053 17048  12438 10869      1930
       (2,3]     2797 10106  12768  6751      1551
       (3,4]        3  3771   4810  2494      1253

> chisq.test(AGE_Category,Category, simulate.p.value = TRUE,correct=FALSE)

    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  AGE_Category and Category
X-squared = 15520, df = NA, p-value = 0.0004998

> chisq.post.hoc(table(AGE_Category,Category), test="chisq.test")
Adjusted p-values used the fdr method.

       comparison raw.p adj.p

1 (0,1] vs. (1,2]     0     0
2 (0,1] vs. (2,3]     0     0
3 (0,1] vs. (3,4]     0     0
4 (1,2] vs. (2,3]     0     0
5 (1,2] vs. (3,4]     0     0
6 (2,3] vs. (3,4]     0     0
jin mu
  • 21

1 Answers1

0

The sample size and the difference between each age category are both important enough that you get really tiny p-values. Under a certain point (2.22e-16), R assigns a value of 0 to p-values.

For instance, if you take your first post-hoc chi-squared test ("(0,1] vs. (1,2]"), essentially it uses the base R chisq.test function to conduct a chi-squared test on the first two rows of your original table. This is the same thing as:

tab = rbind(c(9763, 24097,   7113, 11431,      2389),
           c(14053, 17048,  12438, 10869,      1930))
res = chisq.test(tab)
res$p.value
0

As you see, the output of base R chisq.test is consistent with the output of the chisq.post.hoc function you used.

Note that in its default output, R doesn't display "p-value = 0" , but "p-value < 2.2e-16":

print(res)
Pearson's Chi-squared test

data: tab X-squared = 3472.9, df = 4, p-value < 2.2e-16

The chisq.post.hoc function isn't that subtle, and just displays the actual value assigned by R to the p-value (0), instead of < 2.2e-16.

If you use Monte Carlo simulation, the p-values will no longer be zero, but they will still be very small. You should ask yourself if it changes the interpretation you have of the results.

Trying again with your original dataset, and with the chisq.post.hoc from the R package you used (fifer), but this time we use a Monte Carlo simulation with 100,000 replicates (instead of using asymptotic chi-squared distribution):

library(fifer)
tab = rbind(c(9763, 24097,   7113, 11431,      2389),
          c(14053, 17048,  12438, 10869,      1930),
          c(2797, 10106,  12768,  6751,      1551),
          c(   3,  3771,   4810,  2494,      1253))

chisq.post.hoc(tab, test="chisq.test", simulate.p.value=T, B=100000, digits = 10) Adjusted p-values used the fdr method.

comparison raw.p adj.p 1 9.9999e-06 9.9999e-06 2 9.9999e-06 9.9999e-06 3 9.9999e-06 9.9999e-06 4 9.9999e-06 9.9999e-06 5 9.9999e-06 9.9999e-06 6 9.9999e-06 9.9999e-06

J-J-J
  • 4,098