I've always been under the impression that Mcnemar's Test is for paired categorical data, just like paired t-tests are for paired continuous data. However, I was looking at the Wikipedia article for it here (https://en.wikipedia.org/wiki/McNemar%27s_test), and I'm somewhat confused. Suppose we're comparing two tests on the same subjects.
| Test 2 positive | Test 2 negative | |
|---|---|---|
| Test 1 positive | a | b |
| Test 1 negative | c | d |
From what I'm reading, Mcnemar's Test uses the null hypothesis $p_b = p_c$.
Wikipedia says the following: "Now presume two tests are performed on the same group of patients. And also presume that these tests have identical sensitivity and specificity. In this situation one is carried away by these findings and presume that both the tests are equivalent. However this may not be the case. For this we have to study the patients with disease and patients without disease (by a reference test). We also have to find out where these two tests disagree with each other. This is precisely the basis of McNemar's test."
This was my understanding as well. But suppose there are 20 subjects, and after running Test 1 and 2 on them, we have
| Test 2 positive | Test 2 negative | |
|---|---|---|
| Test 1 positive | 2 | 8 |
| Test 1 negative | 8 | 2 |
so the two tests agree only for 4 subjects, but disagree on 16 subjects. Clearly, Test 1 and 2 don't agree much at all. And yet, Mcnemar's Test has a p-value of 1, giving us a conclusion of no difference.
What am I understanding wrong here?