Questions tagged [agreement-statistics]

Agreement is the degree to which two raters, instruments, etc, give the same value when applied to the same object. Special statistical methods have been designed for this task.

Agreement is the degree to which two raters, instruments, etc., give the same value (rating / measurement) when applied to the same object. Agreement can be assessed to determine if one measurement can be substituted for another, the reliability of a measurement, etc. Trying to assess agreement using a correlation coefficient (or perhaps a chi-squared test for categorical variables) is a very common and intuitive mistake. Special statistical methods have been designed for this task.

Some references:

The Wikipedia entry on inter-rater agreement.
Jon Uebersax's website on agreement statistics.
Robinson, W.S. (1057). The statistical measurement of agreement. American Sociological Review, 22, 1, pp. 17-25.
Bland J.M. & Altman D.G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 327, 8476, pp. 307–310.

445 questions

votes

4 answers

How can I best deal with the effects of markers with differing levels of generosity in grading student papers?

Around 600 students have a score on an extensive piece of assessment, which can be assumed to have good reliability/validity. The assessment is scored out of 100, and it's a multiple-choice test marked by computer. Those 600 students also have…

agreement-statistics

asked Nov 13 '14 at 01:03

user1205901 - Слава Україні

12,873

votes

3 answers

Inter-rater statistic for skewed rankings

I have several sets of 10 raters which I want to compare. Each rater can cast only Yes or No vote, however this decision is skewed and the Yes votes make only about 10% of all votes (and this is expected, i.e. the such proportion is objectively…

agreement-statistics

asked Jun 03 '12 at 11:01

user88

votes

1 answer

Comparing inter-rater agreement between classes of raters

I'm interested in comparing ratings of the same objects that were done by raters from 2 different GROUPS/CLASSES (Experts, and Semi-Experts), so I can decide whether Semi-experts can replace experts in my setting. Though I thought I'd easily find a…

agreement-statistics

asked Jan 07 '15 at 21:48

ynagar

votes

1 answer

Calculating inter rater reliability where raters and ratees only partially overlap

I'm new to inter rater reliability calculations. I have 5 developers and 6 raters. The raters ranked the first 3 of these developers based on some criteria like the following example: For example, dev1 dev2 dev3 dev4 dev5 ranker1 …

agreement-statistics

asked Apr 23 '13 at 02:03

Programmer

votes

2 answers

Inter-rater agreement with only one subject?

I'm trying to describe inter-rater reliability for a group of 5 researchers working on the same project. We're administering the Hamilton Anxiety rating scale which has 14 items, each item having 4 ordinal levels. Each rater has watched the same…

agreement-statistics

asked Sep 26 '19 at 06:05

Lachlan

1,192

votes

2 answers

Agreement among raters with missing data

Let $M$ be a n x k matrix which is the outcome of a subjective test, where $n$ is the number of samples and $k$ is the number of raters. Values in $M$ can range from 0 to 1 with a step of 0.1. Since the number of samples is high and the evaluation…

agreement-statistics

asked Mar 27 '17 at 10:36

firion

votes

1 answer

Advantages and limitations of Gwet’s AC1 statistic and PABAK

I am looking for alternatives to the kappa to assess inter-rater agreement. I've come across two hopefulls: Gwet’s AC1 statistic and PABAK. I wondered what the advantages and disadvantages of each were?

agreement-statistics

asked Jan 25 '12 at 18:42

michelle

votes

0 answers

Intercoder reliability for multi-label categorical data

I have a task where multiple coders can apply non-mutually exclusive categories to a list of subjects, e.g., more categories can be applied to the same subject. What could be the most appropriate method for assessing intercoder reliability for each…

agreement-statistics

asked Apr 04 '18 at 10:28

havanakoda

votes

1 answer

Which measure for inter-rater agreement for continuous data of 2 raters about multiple subjects in multiple situations?

I've considered measures like Cohen's kappa (but data is continuous), intra class correlation (reliability, not agreement), standard correlation (will be high when one rater always rates consistently higher than the other rater)... but none seem to…

agreement-statistics

asked Jan 02 '18 at 15:29

Caeline

vote

1 answer

Fleiss Kappa score - NaN for perfect agreement

I am working on a dataset that has three raters. Rating are Yes/No. I have a set of ratings where all raters said No. I used a R package and excel formulas to calculate the kappa score. Both return NaN. I get why it returns NaN. Because the expected…

agreement-statistics

asked Jul 15 '23 at 17:00

akalanka

vote

0 answers

Agreement among >2 raters. Which rater disagrees more?

There are 10 raters giving a value on the ordinal scale for 20 instruments. I applied Krippendorff's alpha to calculate the overall agreement between the raters, that is equal to 0.7 (tentative agreement). How can I identify the rater who disagrees…

agreement-statistics

asked Feb 11 '21 at 09:51

Gregory

vote

2 answers

Fantasy Football Expert Consensus Rankings

Suppose there are n experts who rank m players. Let's assume there are no ties, a lower ranking is considered "better", and the rankings done by each expert are complete. For each player, we know: the best rank the worst rank the average rank the…

agreement-statistics

asked May 03 '20 at 00:51

j18w

vote

0 answers

Measure of agreement multiple elements within multiple items

Two raters assessed the risk of bias in 3 randomised controlled trial (RCT) research articles using a standard critical appraisal checklist. The checklist has 10 questions addressing key risks of bias in RCTs. Response to questions was dichotomous…

agreement-statistics

asked Dec 19 '19 at 14:47

BleakBleakFens

vote

0 answers

What is a good measure of the inter-rater agreement when each annotation is a ranked list of 3 elements and there are more than 2 annotators?

What is a good measure of the inter-rater agreement when one has the following two conditions: each annotation is a ranked list of 3 elements (the annotator can choose them amongst 10 elements), there are more than 2 annotators (a.k.a.…

agreement-statistics

asked Dec 08 '19 at 21:06

Franck Dernoncourt

46,817
33
176
288

vote

0 answers

Agreement between two groups of raters

What is the best method for assessing agreement between to groups? I have 2 ICU docs and 2 radiologists evaluating x-ray exams with yes/no outcome. I can use Cohen kappa for agreement within each group, but how can I assess agreement between the…

agreement-statistics

asked Oct 27 '19 at 19:15

Summer

2 3 Next