3

I have the following data set: 70 participants, who answered arithmetic questions of different length. A wrong answer coded as 0, a correct answer coded as 1. I want to know if there is a overall difference between the two question types. Since every participant filled out both kinds of questions, which procedure should I use to find out whether they differ or not as my data set is dependent.

I would like to do the analysis in R.

Will the following give me correct results?

A McNemar test with the following cont. table:

        short  long
correct  w       x
Wrong    y       z

Can anyone help?

Macro
  • 44,826
Adam
  • 211
  • 1
    Is this paired binary data? Are you just looking to see whether there's a change in the "correctness" probability before vs. after? – Macro Jul 03 '12 at 15:08
  • Yes I am looking whether there is a change in correctness between the two different length items. You can see it as before and after. If paired means that there is always one participant who filled out both types of questions and I can match them, yes it is. – Adam Jul 03 '12 at 15:13

2 Answers2

5

The data may be thought of as arising from a table of the form

\begin{array}{c|cc} \phantom{} & {\rm Question \ 1 \ -Yes} & {\rm Question \ 1 -No} \\ \hline {\rm Question \ 2 \ - Yes} & a & b \\ {\rm Question \ 2 \ - No} & c & d \\ \end{array}

with corresponding cell probabilities $p_a, p_b, p_c, p_d$. Therefore, if the marginal "success" probabilities are the same for both questions, then $$p_a + p_b = p_a + p_c$$ and $$ p_c + p_d = p_b + p_d $$ Either way you look at it $p_b$ and $p_c$ have to be the same for the two questions to have the same marginal probabilities. Thus, we test

$$ H_0 : p_b = p_c $$

and rejection of the null hypothesis indicates there is a difference. McNemar's Test gives an approximate (read: asymptotic) way of testing this hypothesis, which is a good approximation when the $b$ and $c$ cells are not too sparse. The test statistic is

$$ M = \frac{ (b-c)^2 }{b + c} $$

and is approximately $\chi^2$ distributed with 1 degree of freedom. To do this is R you simply need to calculate the cell counts, compute m=(b-c)^2 / (b+c) and in get the approximate $p$-value with 1-pchisq(m,1).

Macro
  • 44,826
  • Excellent post! Thank you. Still, one question: I have 15 items each. So the count at place a, would then be, all participants that answered short and long questions correctly. But with having 15 short and 15 long questions, what exactly do I put into that table – Adam Jul 03 '12 at 15:37
  • McNemar's test only allows you to do pairwise comparisons, so you won't be able to look at the collective heterogeneity in 15 questions at the same time. You could, hypothetically, look at every pair. But, in that case, you're doing $\binom{15}{2} = 105$ tests, so some correction for multiple testing should be done. – Macro Jul 03 '12 at 15:43
  • is the bonferroni correction the right solution here? – Adam Jul 03 '12 at 15:57
  • 1
    If your purpose here is descriptive, and not to test a scientific hypothesis, you may report the value of the test statistics as a measure of which pairs seem most discordant, without correction. But, if you are doing tests of 105 different hypotheses, then you should probably do some correction. You could use the Bonferroni (or Sidak) correction to control the probability of incorrectly rejecting the null hypothesis due purely to the number of tests. – Macro Jul 03 '12 at 16:09
  • Bonferroni and Sidak control the family wise error rate (http://en.wikipedia.org/wiki/Familywise_error_rate), which is often times quite conservative (i.e. it reduces power). A less conservative approach would control the false discovery rate (http://en.wikipedia.org/wiki/False_discovery_rate#Controlling_procedures) rather than the family-wise error rate – Macro Jul 03 '12 at 16:10
  • Allright, thanks for the info. Now I just have to find a way to calculate 105 tests and report them. I got myself into something here.... – Adam Jul 03 '12 at 16:15
  • @Adam, re-reading what your initial comment, it appears that maybe there are only 15 tests, not 105. If you have a short and long version of each of 15 questions, and you want to see whether the answers are different, then you'd only have 15 tests. I interpreted it as every possible pair of short/long. – Macro Jul 03 '12 at 16:19
0

Paired proportions have traditionally been compared using McNemar's test but an exact alternative due to Liddell (1983) is preferable.

https://stats.stackexchange.com/a/152257/77102