Scoring classification model performance often seems somewhat abstract (looking at you AUC scores...). There's always accuracy score, which has the advantage of being nice and easy to comprehend and which is great for explaining how well the model will work to someone else (like say, the people who are actually going to use the predictions it makes). I intuitively expect there to be a common similar method for probability predictions, for example a simple "average distance from truth" along the lines of:
| Truth | Prediction | Score |
| ----- | ---------- | ----- |
| 1 | 0.97 | 0.03 |
| 0 | 0.35 | 0.35 |
| 1 | 0.76 | 0.24 |
| 0 | 0.42 | 0.42 |
With the score for the model as a whole being taken as the average of those scores; 0.26 in this case. That's pretty easy to manually do, but it surprises me that a) this isn't a common scoring metric and b) there doesn't seem to be any in-built methods in the scikit-learn api.
So my question is this: is "average distance from truth" a useful scoring metric and if the answer is no, why not?
scikit-learnhas it. In general though: most scoring rules are derived from Decision Science/Forecasting approaches so core CS/ML practitioner are not directly exposed to them as part of their academic training. – usεr11852 Feb 23 '19 at 23:11