Problems in multiple Comparison after significant chi squared test

Question

I've ran the Chi-Squared test for the following crosstable and got some weird results. Is there anyone who can help me work out the problem?

            Category
  AGE_Category Very Low  Low Medium High Very High
       (0,1]     9763 24097   7113 11431      2389
       (1,2]    14053 17048  12438 10869      1930
       (2,3]     2797 10106  12768  6751      1551
       (3,4]        3  3771   4810  2494      1253

> chisq.test(AGE_Category,Category, simulate.p.value = TRUE,correct=FALSE)

    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  AGE_Category and Category
X-squared = 15520, df = NA, p-value = 0.0004998

> chisq.post.hoc(table(AGE_Category,Category), test="chisq.test")
Adjusted p-values used the fdr method.

       comparison raw.p adj.p

1 (0,1] vs. (1,2]     0     0
2 (0,1] vs. (2,3]     0     0
3 (0,1] vs. (3,4]     0     0
4 (1,2] vs. (2,3]     0     0
5 (1,2] vs. (3,4]     0     0
6 (2,3] vs. (3,4]     0     0

I changed the long name of the variables before I posed it here (to make the code shorter), and it should not be the reason of the Zero Results.. — jin mu, Jan 18 '17 at 10:51
What is the data type of your data set? Calling table() may not produce the results you expect-- have you verified that the result of that call is what you intend? — Upper_Case, Jan 18 '17 at 13:33
Is the function chisq.post.hoc from this package? Please clarify! Posts must include such essential information — kjetil b halvorsen, Feb 09 '21 at 13:36

score 0 · Answer 1 · answered Sep 23 '23 at 17:33

The sample size and the difference between each age category are both important enough that you get really tiny p-values. Under a certain point (2.22e-16), R assigns a value of 0 to p-values.

For instance, if you take your first post-hoc chi-squared test ("(0,1] vs. (1,2]"), essentially it uses the base R chisq.test function to conduct a chi-squared test on the first two rows of your original table. This is the same thing as:

tab = rbind(c(9763, 24097,   7113, 11431,      2389),
           c(14053, 17048,  12438, 10869,      1930))
res = chisq.test(tab)
res$p.value
0

As you see, the output of base R chisq.test is consistent with the output of the chisq.post.hoc function you used.

Note that in its default output, R doesn't display "p-value = 0" , but "p-value < 2.2e-16":

print(res)
Pearson's Chi-squared test
data:  tab
X-squared = 3472.9, df = 4, p-value <
2.2e-16

The chisq.post.hoc function isn't that subtle, and just displays the actual value assigned by R to the p-value (0), instead of < 2.2e-16.

If you use Monte Carlo simulation, the p-values will no longer be zero, but they will still be very small. You should ask yourself if it changes the interpretation you have of the results.

Trying again with your original dataset, and with the chisq.post.hoc from the R package you used (fifer), but this time we use a Monte Carlo simulation with 100,000 replicates (instead of using asymptotic chi-squared distribution):

library(fifer)
tab = rbind(c(9763, 24097,   7113, 11431,      2389),
          c(14053, 17048,  12438, 10869,      1930),
          c(2797, 10106,  12768,  6751,      1551),
          c(   3,  3771,   4810,  2494,      1253))
chisq.post.hoc(tab, test="chisq.test", simulate.p.value=T, B=100000, digits = 10)
Adjusted p-values used the fdr method.
comparison      raw.p      adj.p
1            9.9999e-06 9.9999e-06
2            9.9999e-06 9.9999e-06
3            9.9999e-06 9.9999e-06
4            9.9999e-06 9.9999e-06
5            9.9999e-06 9.9999e-06
6            9.9999e-06 9.9999e-06

Problems in multiple Comparison after significant chi squared test

1 Answers1