7

Let us say we have an RxC table, when R and C are not necessarily equal, when is the maximum value of the Chi Square statistic achieved? and how can this be proven?

whuber
  • 322,774
Tal Galili
  • 21,541
  • What other constraints do you want to put on this? For example, I assume you're interested in the maximum value for a certain fixed sample size. (??) – cardinal Aug 01 '11 at 17:53
  • Indeed. This is a follow-up question to enable the generalization of this: http://stats.stackexchange.com/questions/13211/what-is-the-maximum-for-pearsons-chi-square-statistic – Tal Galili Aug 01 '11 at 17:56
  • Hi Caracal - where might I find this proof? (thanks) – Tal Galili Aug 01 '11 at 18:05

1 Answers1

10

Assume we have an $I \times J$ table of relative frequencies $f_{ij} \; (1 \leq i \leq I, 1 \leq j \leq J)$, where (without loss of generality) $I < J$:

$ \begin{array}{ccccc|l} f_{11} & \ldots & f_{1j} & \ldots & f_{1J} & f_{1.} \\ \vdots & \ddots & \vdots & \ddots & \vdots & \vdots \\ f_{i1} & \ldots & f_{ij} & \ldots & f_{iJ} & f_{i.} \\ \vdots & \ddots & \vdots & \ddots & \vdots & \vdots \\ f_{I1} & \ldots & f_{Ij} & \ldots & f_{IJ} & f_{I.} \\\hline f_{.1} & \ldots & f_{.j} & \ldots & f_{.J} & 1 \end{array} $

Now define $\varphi^{2} := \chi^{2} / N = \sum_{i}\sum_{j} \frac{(f_{ij} - e_{ij})^{2}}{e_{ij}}$ where $e_{ij} := f_{i.} f_{.j}$. The claim is that $\chi^{2} \leq N \cdot (I-1$), i.e., $\varphi^{2} \leq I-1$. For $\varphi^{2}$ to be defined, we need to assume that all $e_{ij} > 0$, i.e., all $f_{i.} > 0$ and $f_{.j} > 0$. This means that in each row, as well as in each column, at least one $f_{ij} > 0$. Now rewrite $\varphi^{2} = \left(\sum_{i}\sum_{j} \frac{f_{ij}^{2}}{e_{ij}}\right) - 1$. Adding 1, the claim can be restated as $\sum_{i}\sum_{j} \frac{f_{ij}^{2}}{e_{ij}} \leq I$. This follows because

$ \begin{array}{rcl} \sum_{i}\sum_{j} \frac{f_{ij}^{2}}{e_{ij}} &=& \sum_{i}\sum_{j} \frac{f_{ij}^{2}}{f_{i.} f_{.j}} = \sum_{i}\sum_{j} \frac{f_{ij}}{f_{i.}} \frac{f_{ij}}{f_{.j}}\\ &\leq& \sum_{i}\sum_{j} \frac{f_{ij}}{f_{i.}} = \sum_{i} \left(\frac{1}{f_{i.}} \sum_{j} f_{ij}\right)\\ &=& \sum_{i} \left(\frac{1}{f_{i.}} f_{i.}\right) = \sum_{i} 1 = I \end{array} $

The crucial step is from the first to the second line. The inequality holds because $0 \leq \frac{f_{ij}}{f_{.j}} \leq 1$ such that $\frac{f_{ij}}{f_{i.}} \frac{f_{ij}}{f_{.j}} \leq \frac{f_{ij}}{f_{i.}}$ for all $i, j$. If all elements of the sum are smaller than the corresponding elements of a second sum, then the first sum is smaller than the second one.

$\varphi^{2}$ becomes $I-1$ when $\frac{f_{ij}}{f_{i.}} \frac{f_{ij}}{f_{.j}} = \frac{f_{ij}}{f_{i.}}$ for all $i, j$. This happens when $\frac{f_{ij}}{f_{.j}} = 1$ or $\frac{f_{ij}}{f_{i.}} = 0$ for all $i, j$. This in turn happens when $f_{ij}= f_{.j}$ or $f_{ij} = 0$ for all $i, j$. This is the case under complete dependence, i.e., when in each column, only one cell has entries $> 0$.

caracal
  • 12,009
  • Perfect, exactly what I was looking for - thank you! I was stuck in the second line when working on this myself. – Tal Galili Aug 01 '11 at 19:17