0

Say I have a model that predicts hair color in a population: $X_1\%$ is brown, $X_2\%$ is black, $X_3\%$ is blonde, and in general color $i$ occurs $X_i \%$ of the time. Now, I go out and measure the population and find $Y_1\%$ brown, $Y_2\%$ is black, $Y_3\%$ is blonde, and in general color $i$ occurs $Y_i \%$ of the time (let's assume this comes from looking at $N$ people). All of my $X_i$'s and $Y_i$'s sum to $100\%$.

What would be the proper statistical measure to see if my model accurately predicted the percentage breakdown of hair color?

qgp07
  • 101
  • 1
  • You can use a khi square test to see if the repartition you got is enough close to the one in the population.

    Something else you can do is to compute the MRSE (Mean Root Square Error)

    – Abdoul Haki Jun 10 '19 at 14:44
  • Do you really mean predict rather than estimate the population proportions? – Michael R. Chernick Jun 10 '19 at 14:45
  • @MichaelChernick, I don't think so but correct me if I'm using terminology wrong. The model would use independent data to come up with its prediction that would then be tested against the actual population. – qgp07 Jun 10 '19 at 14:59
  • You need to define "accurately" first. – user158565 Jun 10 '19 at 15:33
  • I think you are estimating the proportions. Prediction would apply to determining the color of a future observation. Often we assume the samples are independent and identically distributed. But it is possible to generate data that are correlated. – Michael R. Chernick Jun 10 '19 at 16:09

0 Answers0