Standardised residuals in contingency tables and statistically significant result

Question

I have the following contingency table that results in statistically significant result of Pearson's $\chi^2$ Test for Count Data.

I've computed Pearson residuals that should have the same properties as the z-score (elaborated in the answer Correlation among categories between categorical nominal variables). However, I've noticed several issues:

The mean value of Pearson residuals is 0.0388, not 0
The standard deviation is 1.216, not 1
None of the Pearson residuals exceed 1.96. It looks like no residual deviates enough from the expected value to be a cause of a statistically significant $\chi^2$ test.

I have the following questions:

If the mean and sd of residuals from the table do not have properties of a standardised variable, how can we justify the use of $z$ distribution?
Is it possible to say what causes the statistically significant $\chi^2$ test from the current analysis?

This example can be reproduced in R:

data <- matrix(c(15, 25, 15, 60, 15, 20), nrow=3, byrow=TRUE)
test <- chisq.test(data)
mean(test$residuals) # Pearson residuals mean = 0.03877184
sd(test$residuals)   # Pearson residuals sd = 1.215643

Thank you for your feedback in advance.

Glen_b · Accepted Answer · 2023-08-25T07:51:17.337

Pearson residuals are not least squares residuals; they don't have mean 0 in general, and their sample s.d. will not be 1 (least squares residuals in a regression won't have that in general either), nor is it necessarily the case that any Pearson residual will exceed 1.96 in absolute value after a rejection of the null.

There is a form of standardized residual for a test of independence whose properties will be a little closer to being like a the kind of standardized residual you're used to but their sum of squares will not be the chi-squared statistic (for an uncorrected chi-squared that should be the case for a Pearson residual).

    d <- matrix(c(15, 25, 15, 60, 15, 20), nrow=3, byrow=TRUE)
    ch2 <- chisq.test(d,correct=FALSE)
    ch2$statistic  #  gives 7.397959 
    sum(ch2$residuals^2) # ditto

You may find the standardized residuals a little easier to interpret though:

    ch2$stdres
              [,1]      [,2]
    [1,]  1.208734 -1.208734
    [2,] -2.672612  2.672612
    [3,]  1.895682 -1.895682

Thank you for your answer and your support on CrossValidated, I've learnt a lot from your replies. — Lstat, Aug 25 '23 at 07:32

Standardised residuals in contingency tables and statistically significant result

1 Answers1