3

Say I have two models (classifiers), $M_1$ and $M_2$, each with their own accuracy w.r.t. the ground truth, and I calculate also Cohen's Kappa between each model and the ground truth (as a measure of agreement between model predictions and ground truth).

Can I expect that the model with higher accuracy will also have a higher Cohen's Kappa, i.e. better agreement with ground truth? How can I prove (or refute) this?

ttnphns
  • 57,480
  • 49
  • 284
  • 501
vbn
  • 33
  • 6
  • Kappa is a measure of interrater reliability. Accuracy (at least for classifiers) is a measure of how well a model classifies observations. They aren't comparable quantities. – Peter Flom Dec 15 '19 at 17:22
  • @PeterFlom-ReinstateMonica Could you at least give me an example (for the same case) where increasing accuracy provokes a decrease in kappa? For me that's enough, I have been unable to find one. – vbn Dec 16 '19 at 07:20
  • The term "the kappa of M1" is meaningless. Cohen's Kappa is a relation between two models. You can ask about the Kappa of M1 and M2, for example. – Itamar Mushkin Dec 16 '19 at 07:28
  • Theoretically, you can ask about the "Kappa of M1 and the ground truth". In this case, You see from the link you've shared about Cohen's Kappa that it is just a modified accuracy - it is equal to (p_0 -p_e)/(1-p_e), where p_0 is just the accuracy. So, no, if you ask about "the Kappa of M1 and the ground truth", then there is no situation in which an increase in accuracy provokes a decrease in this (probably ill-named) Kappa. – Itamar Mushkin Dec 16 '19 at 07:30
  • @ItamarMushkin Could you please make a more formal demonstration? I have been unable to make one. – vbn Dec 16 '19 at 08:25
  • @ItamarMushkin Yeah, the comparison it's between the model and the ground truth. It's how it's used normally in machine learning so it was obvious to me. – vbn Dec 16 '19 at 08:27
  • If this is the usage you meant, please edit your question accordingly. I did not get that use, and it took me a while to find something similar online (here: https://thedatascientist.com/performance-measures-cohens-kappa-statistic/ - one rater is just guessing according to class frequency).
  • – Itamar Mushkin Dec 16 '19 at 10:16
  • After you edit your question and it's clear enough to be reopened, I'll try elaborating a bit more, bur really there's nothing more to it than what I've said in my comment.
  • – Itamar Mushkin Dec 16 '19 at 10:17
  • I've suggested an edit in agreement with what you've asked in comments. Feel free to elaborate or correct it. – Itamar Mushkin Dec 16 '19 at 10:22
  • No, I can't, because they measure different things. It's like saying "Can you give me an example where raising the temperature increases length?" – Peter Flom Dec 16 '19 at 10:51
  • @ItamarMushkin I have accepted your edit, it was exactly what I meant. The demonstration is not necessary to be too complex or too formal, a small explanation it's enough. – vbn Dec 16 '19 at 13:03
  • In that case, I gave the explanation in a previous comment: in that case, Cohen's Kappa is just equal to (p_0 -p_e)/(1-p_e), where p_0 is just the accuracy. So, it increases monotonically with accuracy. That's it. – Itamar Mushkin Dec 16 '19 at 13:42
  • I hope that the question (after my edit) will be re-opened, so I can move the answer from comments to answer (and delete most of the comments, or at least mine). – Itamar Mushkin Dec 16 '19 at 13:43
  • @PeterFlom-ReinstateMonica Please re-open the question now it's clear what I want. – vbn Dec 16 '19 at 16:57
  • @ItamarMushkin But p_e also changes when accuracy changes. – vbn Dec 16 '19 at 17:00
  • @ItamarMushkin Any ideas? – vbn Dec 18 '19 at 09:13
  • The more I search, the less I understand how to use Cohen's Kappa to compare an estimator to the ground truth. The SKLearn page warns explicitly against this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html – Itamar Mushkin Dec 23 '19 at 12:36
  • The accepted answer here touches on your question: https://stats.stackexchange.com/questions/303149/cohens-kappa-as-a-classifier-strength-estimator?rq=1, specifically look at the part about how to compute p_e – Itamar Mushkin Dec 23 '19 at 14:26