2

I have the following contingency table that results in statistically significant result of Pearson's $\chi^2$ Test for Count Data.

enter image description here

I've computed Pearson residuals that should have the same properties as the z-score (elaborated in the answer Correlation among categories between categorical nominal variables). However, I've noticed several issues:

  1. The mean value of Pearson residuals is 0.0388, not 0
  2. The standard deviation is 1.216, not 1
  3. None of the Pearson residuals exceed 1.96. It looks like no residual deviates enough from the expected value to be a cause of a statistically significant $\chi^2$ test.

I have the following questions:

  1. If the mean and sd of residuals from the table do not have properties of a standardised variable, how can we justify the use of $z$ distribution?
  2. Is it possible to say what causes the statistically significant $\chi^2$ test from the current analysis?

This example can be reproduced in R:

data <- matrix(c(15, 25, 15, 60, 15, 20), nrow=3, byrow=TRUE)

test <- chisq.test(data) mean(test$residuals) # Pearson residuals mean = 0.03877184 sd(test$residuals) # Pearson residuals sd = 1.215643

Thank you for your feedback in advance.

Lstat
  • 145

1 Answers1

3

Pearson residuals are not least squares residuals; they don't have mean 0 in general, and their sample s.d. will not be 1 (least squares residuals in a regression won't have that in general either), nor is it necessarily the case that any Pearson residual will exceed 1.96 in absolute value after a rejection of the null.

There is a form of standardized residual for a test of independence whose properties will be a little closer to being like a the kind of standardized residual you're used to but their sum of squares will not be the chi-squared statistic (for an uncorrected chi-squared that should be the case for a Pearson residual).

    d <- matrix(c(15, 25, 15, 60, 15, 20), nrow=3, byrow=TRUE)
    ch2 <- chisq.test(d,correct=FALSE)
    ch2$statistic  #  gives 7.397959 
    sum(ch2$residuals^2) # ditto

You may find the standardized residuals a little easier to interpret though:

    ch2$stdres
              [,1]      [,2]
    [1,]  1.208734 -1.208734
    [2,] -2.672612  2.672612
    [3,]  1.895682 -1.895682
Glen_b
  • 282,281