1

I am running an experiment where I have a finite set of raters and a finite set of items, and raters have to provide their subjective judgment about each item. The goal is to measure the importance of those items for the raters.

For every item, each rater uses a Likert-like scale (1= Unimportant, 2= Of Little Importance, 3=Moderately Important, 4=Important, 5=Very Important)

Knowing that the judgments are subjective I want to measure how raters agree in their ratings, and eventually observe new patterns in theirs judgments.

The question is: Which statistical method/tool is more appropriate for such an analysis?

chl
  • 53,725
Lahniep
  • 111

2 Answers2

2

Cohen's kappa is the most common statistic to test interrater agreement. There is another version of it due to Fleiss. There is also a weighted kappa and the intraclass correlation that is used.

  • Thanks @Michael for your suggestion. After googling for it, i found the Kappa method. But It seems that it is used when you evaluate raters' judgments against a Standard. But in my case there is no standard as every judgment is completely subjective. –  Jul 15 '12 at 20:33
  • Kappa measures interrater agreement. There is a rating system assumed like your Likert scale. That is all that is meant by comparison to a standard. You need to have a score to know if there is complete agreement or some degree of disagreement. I have used it many time to do exactly what you want. – Michael R. Chernick Jul 15 '12 at 20:46
  • According to the doucmentation i found on the subject it seems that there is the Kappa Method, used for Qualitative Nominal data (with No logical order between scaling categories) and the Intraclass Correlation Coefficient (ICC) Method, used for qualitative Ordinal data (With a logical order between scaling categories. eg Grading). I wonder if the ICC Method isn't more appropriate for my situation, as i have a logical order to the categories ... –  Jul 15 '12 at 21:13
  • In the two category case there is no distinction. If you have 3 or more categories and want to incorporate the ordering kappa will not do that. – Michael R. Chernick Jul 15 '12 at 21:25
  • I'll try Kappa's Method, based on a nice example i found on wiki. May be it would be interesting to see if i obtain similar results with ICC. Thanks @Michael for your help. –  Jul 15 '12 at 21:51
  • Would ICC work given the small sample? – Cesare Camestre Jul 19 '13 at 13:29
0

I would add some points. In case of random rater, random item, The intraclass correlations has many indexes to support this point. See ICC on Wikipedia for details and the reference listed below. To compare rater, in one study I compared agreement by rater not only in terms of agreement (ICC) but also in terms of mean value. However, I understand that your data are on the ordinary scale, but there are many other factors to take into account.

  • Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155-163.

  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.

Alessio
  • 11