2

Say we have a variable that can assume all values between 0 and 1 and we have a system that predicts measurements of this variable providing estimates in terms of 6 discrete levels (let's say 0, 0.2., 0.4, 0.6, 0.8 and 1.0). Now we are getting the actual outcomes of measurements (about 100) of this variable on a continuous scale.

Given the discreteness of the input what would be the most approriate statistic to assess the predictive quality of the system? A Spearman rank correlation, a (non-parametric) regression, an ANOVA + post hoc test perhaps (given that we have 6 groups of predictions and might want to see whether there's a significant difference between them), or something else?

1 Answers1

2

Depends on how you wish to use those predictions:

  • if you want the predicted value of .1 to mean ".1" and not "something between .05 and 1.5", then I would just use simple correlation. That way you take seriously that .1 means .1 and any deviation from that value is a real deviation.

  • If you want to use the predicted value in the second interpretation, then I would categorize the observed values. In that interpretation, any observed value within the bracket does not represent a deviation. I would than use any measure of association for a cross tabulation of (ordered) categorical variables.

Maarten Buis
  • 21,005
  • Thanks! At the moment I'm fully sure what I would want the outcomes to mean; each of your two suggestions seems reasonable to me. But perhaps I should be satisfied with a given outcome if the means of the continuous outcomes for each of the prediction bins are in the neighborhood of the corresponding bin centers. Would that mean adding a third category to your answer? – Sjoerd C. de Vries Aug 16 '13 at 15:21
  • The distribution of the continuous variable would need to be quite special for that to be true, e.g. a continuous uniform distribution would have that property. – Maarten Buis Aug 19 '13 at 12:48