1

What is a good measure of the inter-rater agreement when one has the following two conditions:

  • each annotation is a ranked list of 3 elements (the  annotator can choose them amongst 10 elements),
  • there are more than 2 annotators (a.k.a. raters)

?


Example:

I have 1000 short texts, 10 types of sentiments (e.g., "happy", "funny", "sarcastic" or "ironic"), and 5 human annotators. I am asking each annotator to go over each of the 1000 short texts, and for each short text indicate, as a ranked list,  which 3 sentiments are the most tangible in the short text. For example, human annotator #1 might decide that short text #361 is ["sarcastic", "ironic", "funny"] (meaning that short text #361 is more "sarcastic" than "ironic", and more "ironic" than "funny", and more "funny" than any of the other 7 sentiments).

Franck Dernoncourt
  • 46,817
  • 33
  • 176
  • 288

0 Answers0