3

I'm out of my element when dealing with statistics, so I hope you'll be able to offer me some guidance.

I'm working on a project where students will apply for scholarships, and then a panel of people (reviewers) will independently evaluate each application and score a number of questions from values 1 through 5.

Given this, a theoretical data set might look like:

Student Reviewer Score 1 Score 2 Score 3 Score 4
Anna Mr. Jones 5 4 5 5
Anna Ms. Smith 4 4 4 5
Anna Mr. Quinn 4 3 5 4
Anna Mr. Blair 5 5 4 3
Anna Ms. Brown 4 4 3 4
Billy Mr. Jones 5 4 4 4
Billy Ms. Smith 3 4 3 3
Billy Ms. Brown 4 4 2 4

I would like to create a single value per student that fairly represents all the other scores in the data set, giving consideration to the number of reviews completed for the student. (Note that in this data set, "Anna" was reviewed five times, while "Billy" was reviewed only three times).

What mathematical process should I use to create such a value??? I've considered the obvious average of all scores, but does the fact that Anna had more reviews than Billy change the statistical relevance of that simple calculation? Is there something more that should be done to account for the variation in number of reviews?

Desired Outcome:

Student Overall / Aggregate Score
Anna ???
Billy ???

1 Answers1

1

An average seems a sensible way of comparing the scores. The only thing to take into account is that if a student has got fewer reviews, then the average score of that student will be more sensitive to the presence of a very strict or a very generous reviewer, while students with a higher number of reviewers will have a more fair average score. Coming back to your questions:

"does the fact that Anna had more reviews than Billy change the statistical relevance of that simple calculation?"

The higher the number of reviewers, the more statistically relevant the score is. The lower the number, the more subject to chance it is. Specifically, if the standard deviation of the reviewer's scores is $ \sigma_r $, then the variability $\sigma_s$ of the student's average score is $ \sigma_s \approx \sigma_r / \sqrt{N}$, where $N$ is the number of reviewers. You want $\sigma_s$ small, so you can reduce $\sigma_r$ (that is, having very good reviewers) or increase $N$ (a higher number of reviewers, even if they have less quality as e.g. if they have lower expertise).

"Is there something more that should be done to account for the variation in number of reviews?"

You could set a minimum number of reviewers to reduce the chances of a student getting values that are not representative, a consequence of some extreme or biased reviews. Also, you could at the end check if there are students that performed exceedingly well or bad, but they happen to be among those with the lowest amount of reviews. For those cases, maybe further reviewing is advisable as a sanity check, to ensure that their average score is actually representative of the student's performance.

rasmodius
  • 1,713
  • 1
  • 12
  • 18