1

The scores on a 80 question multiple-choice test had a mean 70% correct and standard deviation (SD) of 10 (SD of % correct). This SD is 2.5 times higher than the theoretical SD of 4 (based on N, p and q; SD = sqrt of Npq).

I take this higher value to mean that there is real variability in the ability of the test-takers.

The question is: Is there a simple conceptual or mathematical relationship between the SD of the scores and the reliability of the test?

Joel W.
  • 3,306

1 Answers1

2

In classical test theory, reliability of a test is not a property of a test, it's a property of a test in a population.

If the mean is higher, you've either changed the test, or you've changed the population. In which case the reliability is different.

In more modern test theory (item response theory) reliability is a property of a score - a person's ability is measured with a certain reliability, which depends on their answers. But we can also talk about the reliability of a test at a particular ability level.

A test where the mean score is 70/80 (0.875) would probably have greater ability to distinguish between low and average ability (because these people can have a score from 0-70) than between average and high ability (because these people can have a score from 70-80).

A plot that shows the relationship between reliability and ability is called an item information curve. These curves are not necessarily smooth - here are some examples: https://www.researchgate.net/figure/Item-Information-Curves-Left-Panel-Test-Information-Curve-Upper-Right-Panel-and_fig3_261513596

Jeremy Miles
  • 17,812
  • Interesting information, but have you answered the question which asked about the relationship between test reliability and SD of test score? BTW - I clarified the posting: the mean was 70% (not a raw score of 70 out of 80). – Joel W. Jan 20 '22 at 17:32
  • I didn't answer, because I don't think you can change the mean without changing the test. And if you change the test the reliability will change. You expect maximum reliability at 50% correct, but that doesn't mean you will get it. – Jeremy Miles Jan 20 '22 at 17:56
  • Under what conditions would a test with a mean of 50% not have the highest reliability? – Joel W. Jan 20 '22 at 18:30
  • 1
    Extreme example: people answering randomly will have reliability of zero and 50% mean score. Other extreme example: everyone gets questions 1:10 right, and 11:20 wrong. – Jeremy Miles Jan 20 '22 at 18:35
  • Interesting. Might you have examples that are more plausible in a real life testing setting? – Joel W. Jan 20 '22 at 18:47