What is a good measure of the inter-rater agreement when each annotation is a ranked list of 3 elements and there are more than 2 annotators?

Asked Dec 08 '19 at 21:06

Active Dec 08 '19 at 22:49

Viewed 35 times

What is a good measure of the inter-rater agreement when one has the following two conditions:

each annotation is a ranked list of 3 elements (the annotator can choose them amongst 10 elements),
there are more than 2 annotators (a.k.a. raters)

Example:

I have 1000 short texts, 10 types of sentiments (e.g., "happy", "funny", "sarcastic" or "ironic"), and 5 human annotators. I am asking each annotator to go over each of the 1000 short texts, and for each short text indicate, as a ranked list, which 3 sentiments are the most tangible in the short text. For example, human annotator #1 might decide that short text #361 is ["sarcastic", "ironic", "funny"] (meaning that short text #361 is more "sarcastic" than "ironic", and more "ironic" than "funny", and more "funny" than any of the other 7 sentiments).

edited Dec 08 '19 at 22:49

asked Dec 08 '19 at 21:06

Franck Dernoncourt

46,817
33
176
288

What is a good measure of the inter-rater agreement when each annotation is a ranked list of 3 elements and there are more than 2 annotators?

0 Answers0