Averaging Brier score

Question

To score a RandomForestClassifier using GridSearchCV for multiclass classification, I decided to use Brier score.

However, I could only manage to get the Brier score for each class.

Is it reasonable to get the average of that as an overall performance measure? Or can you think of a better way instead?

Edit: I am aware this question is similar, so I'll explain why I think it's a different problem:

When I run my model with brier_score as defined by that question's author (brier_multi), the score obtained for the best model is 202.3

However, when I apply the following code (made by me)

def brier_score_multi(y_true, y_pred):    
    y_true_bin = label_binarize(y_true, classes=[0,1,2])
    y_pred_bin = label_binarize(y_pred, classes=[0,1,2])
    score = mean([brier_score_loss(y_true_bin[:,0], y_pred_bin[:,0]),brier_score_loss(y_true_bin[:,1], y_pred_bin[:,1]),brier_score_loss(y_true_bin[:,2], y_pred_bin[:,2])])
    return score

The best score is 0.0432.

As you can see, this is a big difference, and given the definition of a the brier score, I'm biased towards the second result.

EDIT 2:

Seeing as the first result is incorrect, I started thinking... maybe instead of the average between classes, the sum of the brier score between classes makes more sense?

Not really. For some reason when I try the brier_multi that he applies, it returns ridiculous numbers. I think it might have sth to do with what the random forest does backstage. Actually, let me edit my post and include that — amestrian, Sep 24 '20 at 23:32
The answer to the other question shows that brier_multi is the correct extension of the Brier score to multiple classes. So if your question is not "how do I extend Brier score to multiple classes?" then I don't know what it is. — Sycorax, Sep 24 '20 at 23:54
As a loss function, it doesn’t matter if you divide by the sample size or not. Does your software divide by the sample size? — Dave, Sep 25 '20 at 00:06
If we read a citation in the other answer, the Brier score in the multi-class case is bounded between 0 and 2, so obtaining values of 200 implies some kind of programming or user error. See: https://www.wikiwand.com/en/Brier_score#/overview — Sycorax, Sep 25 '20 at 00:29
@Sycorax that's what I thought... between the random forest itself and the grid search it's a bit of a black box to know exactly what's going on. I'm currently trying some variations to see if I can figure it out but I don't see it likely... would it be too terrible to end up using the one I made? — amestrian, Sep 25 '20 at 01:00
@Dave hmm good question I guess... I'm not sure, just did a google check but couldn't find anything. Anyways I don't think that should affect this score, since I implemented self-made scorers before and the results were reasonable — amestrian, Sep 25 '20 at 01:02
What if instead of averaging it I sum the brier score for the three classes? — amestrian, Sep 25 '20 at 01:33
Dave is correct that we don’t care about rescaling by a positive constant, because the two forms will have the same minima. But it’s still not clear to me why you’re fixated on making a new variation on Brier score. You’ve demonstrated that you’ve got a programming or user error. Find that and you’re done. — Sycorax, Sep 25 '20 at 01:38
because I've tried many things so far and nothing is working, and I don't have enough time to go to the source code to figure out exactly what is wrong with it... — amestrian, Sep 25 '20 at 01:41
But you’re convinced that you have enough time to invent a new, untested idea and make decisions based on it? — Sycorax, Sep 25 '20 at 01:41
well.. i have to use something. I asked something else in here and everyone recommended to use this scoring technique (or log loss, but it's the same issue), because my previous one was not good, so I'm trying it. — amestrian, Sep 25 '20 at 01:45

Averaging Brier score

0 Answers0