Reasoning for using Chi square distribution for counts in Chi square test

Question

I know the Chi square distribution as the sum of $n$ independent, normal random variables (each individually squared). Ie,

$$Z_1^2 + ... + Z_n^2\sim \chi_n^2$$

My question surrounds the common "Chi square" test. The assumption is that each of the data in a contigency table is a count, and each cell of the contingency table is independent. How is it possible that the Chi square test is appropriate in this case if it its distribution is the sum of squared and independent normal variables, but we use the same distribution to test the hypothesis for the sum of squared variables which all include non-normal (as they are counts hence not continuous) random variables. So, why is the Chi square test applicable here when the assumed distribution we are using covers normal random variables, not counts?

your i.e at the start is incorrect – Glen_b Oct 29 '23 at 10:06 — Glen_b, Oct 29 '23 at 10:06

Sextus Empiricus · Answer 1 · 2023-10-29T10:56:54.060

2

The use of the $\chi^2$ distribution for non-normal distributed error terms or residuals is an approximation.

See for several derivations for instance here: Obtaining the chi-squared test statistic via geometry

Also relevant is the De Moivre-Laplace theorem which shows how a binomial distributed count variable can be estimated with a normal distribution.

edited Oct 29 '23 at 10:56

answered Oct 29 '23 at 10:51

Sextus Empiricus

77,915

Reasoning for using Chi square distribution for counts in Chi square test

1 Answers1