I have a 2x3 contingency table - the row variable is a factor, the column variable is an ordered factor (ordinal level). I'd like to apply either symmetrical or asymmetrical association technique. What do you recommend me to do? Which technique do you find the most appropriate?
4 Answers
Linear or monotonic trend tests--$M^2$ association measure, WMW test cited by @GaBorgulya, or the Cochran-Armitage trend test--can also be used, and they are well explained in Agresti (CDA, 2002, §3.4.6, p. 90).
The latter is actually equivalent to a score test for testing $H_0:\; \beta = 0$ in a logistic regression model, but it can be computed from the $M^2$ statistic, defined as $(n-1)r^2$ ($\sim\chi^2(1)$ for large sample), where $r$ is the sample correlation coefficient between the two variables (the ordinal measure being recoded as numerical scores), by replacing $n-1$ with $n$ (ibid., p. 182). It is easy to compute in any statistical software, but you can also use the coin package in R (I provided an example of use for a related question).
Sidenote
If you are using R, you will find useful resources in either Laura Thompson's R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002), which shows how to replicate Agresti's results with R, or the gnm package (and its companion packages, vcd and vcdExtra) which allows to fit row-column association models (see the vignette, Generalized nonlinear models in R: An overview of the gnm package).
On a 2x3 contingency table where the three-level factor is ordered you may use rank correlation (Spearman or Kendall) to assess association between the two variables.
You may also think about the data as an ordered variable observed in two groups. A corresponding significance test could be the Mann-Whitney test (with many ties). This has an associated measure of association, the WMW odds, related to Agresti’s generalized odds ratio.
Both for rank correlation coefficients and WMW odds confidence intervals can be calculated. I find odds more intuitive, otherwise I believe both kinds of measures are appropriate.
- 3,363
-
Thanks for suggestions. I gave up from WMW because of ties. – aL3xa Apr 01 '11 at 22:51
One way to incorporate the ordering of the column factor into your analysis is to use the cumulative frequencies instead of the cell frequencies. So in your table you have:
$$f_{ij}=\frac{n_{ij}}{n_{\bullet\bullet}}\;\;\;\; i=1,2\;\;j=1,2,3$$
where a "$\bullet$" indicates sum over that index. So I suggesting modeling instead:
$$g_{ij}=\sum_{k=1}^{j}f_{ik}$$
Now you basically have a simple hypothesis for association, that the index $i$ doesn't matter. So you have:
$$E(g_{ij}|H_{0})=\sum_{k=1}^{j}\frac{n_{\bullet k}}{n_{\bullet\bullet}}$$
And then use the good old "entropy" test statistic:
$$T(H_{0})=n_{\bullet\bullet}\sum_{i,j}g_{ij}log\left(\frac{g_{ij}}{E(g_{ij}|H_{0})}\right)$$
Plugging in the numbers gives:
$$T(H_{0})=\sum_{i,j}\left(\sum_{k=1}^{j}n_{ik}\right)log\left(\frac{\sum_{k=1}^{j}n_{ik}}{\sum_{k=1}^{j}n_{\bullet k}}\right)$$
And you reject if this number is too big, it should be interpreted as a "log-odds" ratio which will help with choosing cut-offs.
- 24,971
-
Very interesting. Do you propose the last formula as the “measure of association”? – GaBorgulya Apr 02 '11 at 00:58
-
yep, although the expected value should really be adjusted to $\frac{1+\sum_{k=1}^{j}n_{\bullet k}}{3+n_{\bullet\bullet}}$. But this only matters if the counts are small. – probabilityislogic Apr 02 '11 at 01:23
-
You could use the Jonckheere Terpstra test. In SAS, you can get this in PROC FREQ with the /JT option on the tables statement. I didn't see a function for it in R, but there may be one out there.
- 119,535
- 36
- 175
- 383
-
1That's a useful relevant test, but the question asked for a measure of association. Is there one that goes with this test? – onestop Apr 02 '11 at 06:34
-
Well, there is a JT test statistic, and a z equivalent; but it's not something like an odds ratio. So, I guess it depends on exactly what you mean by "measure of association" – Peter Flom Apr 03 '11 at 11:09