Questions tagged [agreement-statistics]

Agreement is the degree to which two raters, instruments, etc, give the same value when applied to the same object. Special statistical methods have been designed for this task.

Agreement is the degree to which two raters, instruments, etc., give the same value (rating / measurement) when applied to the same object. Agreement can be assessed to determine if one measurement can be substituted for another, the reliability of a measurement, etc. Trying to assess agreement using a correlation coefficient (or perhaps a chi-squared test for categorical variables) is a very common and intuitive mistake. Special statistical methods have been designed for this task.

Some references:

445 questions
13
votes
4 answers

How can I best deal with the effects of markers with differing levels of generosity in grading student papers?

Around 600 students have a score on an extensive piece of assessment, which can be assumed to have good reliability/validity. The assessment is scored out of 100, and it's a multiple-choice test marked by computer. Those 600 students also have…
7
votes
3 answers

Inter-rater statistic for skewed rankings

I have several sets of 10 raters which I want to compare. Each rater can cast only Yes or No vote, however this decision is skewed and the Yes votes make only about 10% of all votes (and this is expected, i.e. the such proportion is objectively…
user88
7
votes
1 answer

Comparing inter-rater agreement between classes of raters

I'm interested in comparing ratings of the same objects that were done by raters from 2 different GROUPS/CLASSES (Experts, and Semi-Experts), so I can decide whether Semi-experts can replace experts in my setting. Though I thought I'd easily find a…
ynagar
  • 143
  • 1
  • 8
5
votes
1 answer

Calculating inter rater reliability where raters and ratees only partially overlap

I'm new to inter rater reliability calculations. I have 5 developers and 6 raters. The raters ranked the first 3 of these developers based on some criteria like the following example: For example, dev1 dev2 dev3 dev4 dev5 ranker1 …
3
votes
2 answers

Inter-rater agreement with only one subject?

I'm trying to describe inter-rater reliability for a group of 5 researchers working on the same project. We're administering the Hamilton Anxiety rating scale which has 14 items, each item having 4 ordinal levels. Each rater has watched the same…
Lachlan
  • 1,192
3
votes
2 answers

Agreement among raters with missing data

Let $M$ be a n x k matrix which is the outcome of a subjective test, where $n$ is the number of samples and $k$ is the number of raters. Values in $M$ can range from 0 to 1 with a step of 0.1. Since the number of samples is high and the evaluation…
firion
  • 235
3
votes
1 answer

Advantages and limitations of Gwet’s AC1 statistic and PABAK

I am looking for alternatives to the kappa to assess inter-rater agreement. I've come across two hopefulls: Gwet’s AC1 statistic and PABAK. I wondered what the advantages and disadvantages of each were?
2
votes
0 answers

Intercoder reliability for multi-label categorical data

I have a task where multiple coders can apply non-mutually exclusive categories to a list of subjects, e.g., more categories can be applied to the same subject. What could be the most appropriate method for assessing intercoder reliability for each…
2
votes
1 answer

Which measure for inter-rater agreement for continuous data of 2 raters about multiple subjects in multiple situations?

I've considered measures like Cohen's kappa (but data is continuous), intra class correlation (reliability, not agreement), standard correlation (will be high when one rater always rates consistently higher than the other rater)... but none seem to…
Caeline
  • 21
1
vote
1 answer

Fleiss Kappa score - NaN for perfect agreement

I am working on a dataset that has three raters. Rating are Yes/No. I have a set of ratings where all raters said No. I used a R package and excel formulas to calculate the kappa score. Both return NaN. I get why it returns NaN. Because the expected…
akalanka
  • 111
1
vote
0 answers

Agreement among >2 raters. Which rater disagrees more?

There are 10 raters giving a value on the ordinal scale for 20 instruments. I applied Krippendorff's alpha to calculate the overall agreement between the raters, that is equal to 0.7 (tentative agreement). How can I identify the rater who disagrees…
Gregory
  • 103
1
vote
2 answers

Fantasy Football Expert Consensus Rankings

Suppose there are n experts who rank m players. Let's assume there are no ties, a lower ranking is considered "better", and the rankings done by each expert are complete. For each player, we know: the best rank the worst rank the average rank the…
j18w
  • 13
1
vote
0 answers

Measure of agreement multiple elements within multiple items

Two raters assessed the risk of bias in 3 randomised controlled trial (RCT) research articles using a standard critical appraisal checklist. The checklist has 10 questions addressing key risks of bias in RCTs. Response to questions was dichotomous…
1
vote
0 answers

What is a good measure of the inter-rater agreement when each annotation is a ranked list of 3 elements and there are more than 2 annotators?

What is a good measure of the inter-rater agreement when one has the following two conditions: each annotation is a ranked list of 3 elements (the  annotator can choose them amongst 10 elements), there are more than 2 annotators (a.k.a.…
Franck Dernoncourt
  • 46,817
  • 33
  • 176
  • 288
1
vote
0 answers

Agreement between two groups of raters

What is the best method for assessing agreement between to groups? I have 2 ICU docs and 2 radiologists evaluating x-ray exams with yes/no outcome. I can use Cohen kappa for agreement within each group, but how can I assess agreement between the…
Summer
  • 11
1
2 3