Can I do a paired samples t-test when my data are ordinal? My data are reading levels at time 1 and time 2, but the data levels are A, 1 2,3,4,6,8,10, 12, 14, 20, 24, 28, 30, 34, 38, 40, 50, 60, 70, 80. They appear to be continuous but they are not. Can I still use a paired t-test? These are the reading levels from k - 8th grade. I was able to run a paired t-test on the continuous data, which was a standardized test with grade level equivalency, but I am not sure if I can also do it for the reading level data.
1 Answers
Because this is ordinal data, the assumptions that the data follow a normal distribution will be violated. Given that the assumption of normality is violated, a typical paired t-test in this situation would at best lack sensitivity, and at worst provide spurious estimates. Fortunately there are non-parametric versions of the t-test which do not depend on the assumption of normality, and so are quite suitable for ordinal data.
For this data, I would suggest the signed-rank test. It is designed for paired comparisons on non-normal data.
Here is an example in r:
## first construct our samples to test
# pool of possible ordinal values
# not continuous, however numerical order assumed valid
pool = c(1, 2,3,4,6,8,10, 12, 14, 20, 24, 28, 30, 34, 38, 40, 50, 60, 70, 80)
# sample 1, randomly chosen from pool values
test1 = sample(pool, 100, replace = TRUE)
# sample 2, randomly chosen from pool values
test2 = sample(pool, 100, replace = TRUE)
# sample 3, pool values, weighted towards higher values (those at end of pool)
prob_vec = 1:length(pool)/sum(1:length(pool))
test3_weighted = sample(pool, 100, replace = TRUE, prob = prob_vec)
## run the sign rank test
# test1 vs test2 should not have significant difference, they are both chosen at random
wilcox.test(test1, test2, paired = TRUE)
# V = 1849.5, p-value = 0.1985
# test1 (or test2) vs test 3 should be significant, test 3 is weighted towards
# higher values
wilcox.test(test1, test3_weighted, paired = TRUE)
# V = 1221, p-value = 8.495e-05
- 145,122
- 136
- 1
-
This is a reasonable start. From the comments, I gather the OP would like to control for covariates. How might you adapt / extend this analysis in that situation? – gung - Reinstate Monica Jan 27 '16 at 18:02
-
In the case of controlling for covariates, paired tests can't do much.
However, if a regression model is being employed, in R it is trivial to convert these 'reading levels' into factors. This is simply done with the factor() function. Once converted, the regression model will treat the values as levels, and not measures of magnitude.
The workflow would go something like: Run regression model, include 'reading levels' as factors.
Do follow up tests, between groups of interest, using the wilcox.test procedure outlined above.
– timle Jan 28 '16 at 19:35 -
Well, there is ordinal logistic regression (of which many classical nonparametric tests are special cases). There are mixed effects versions of OLR that are appropriate for repeated measures data (eg, see my answer here: Is there a two-way Friedman's test?). – gung - Reinstate Monica Jan 28 '16 at 19:45
-
Those are excellent suggestions, though I would start with 'reading level' as factor first in a conventional regression framework. If it appears that this regression model violates the assumption of Homoscedasticity I would absolutely move to non-parametric regression options as you suggest. Based on the information given, though, it is only possible to ascertain that paired a test on this categorical data should be of the non-parametric variety. There is no evidence (given in post) to suggest that the same would be true of the regression model. – timle Jan 28 '16 at 19:54
I wanted to see the impact of the reading intervention on students DRA levels. They should ideally be at level 6 at the end of Kindergarten, but many students are far above that level. And to Glen's point, I asked the question based "can I" because I did not know if I could. I am a novice and thought this was a safe place to pose any question.
– Lynette Jan 21 '16 at 15:26