How to implement a GridSearchCV custom scorer that is dependent on a training feature?

Question

I would like to code a custom scoring function using the make_scorer function, where my custom_function(y_true, y_pred)calculates the DAILY sumproduct of y_true and y_pred and outputs, say the mean, for example. The problem is that the timestamps are made available in my X matrix as a feature and I cannot access index of current folder.

Does anyone have a clue how to make this possible?

score 3 · Answer 1 · answered Jan 01 '22 at 15:33

One option is to create a custom score function that calculates the loss and groups by day.

Here is a rough start:

import numpy as np
from sklearn.metrics         import make_scorer
from sklearn.model_selection import GridSearchCV
def custom_loss_function(model, X, y):
    y_pred = clf.predict(X)
    y_true = y
    difference = y_pred-y_true
    group_timestamp = X[0] # Timestamp column
    score_by_day = np.array([difference[group_timestamp==i].sum() for i in np.unique(group_timestamp)]) # Groupby
    score_overall = np.mean(score_by_day)
    return score_overall
custom_scorer = make_scorer(custom_loss_function, greater_is_better=True)
GridSearchCV(model, 
            param_grid=param_grid,
            scoring=custom_scorer)

score 1 · Answer 2 · answered Oct 12 '20 at 03:06

You should be able to do this, but without make_scorer.

The "scoring objects" for use in hyperparameter searches in sklearn, as those produced by make_scorer, have signature (estimator, X, y). Compare with metrics/scores/losses, such as those used as input to make_scorer, which have signature (y_true, y_pred).

So the solution is just to define your own "scoring object" directly, and reference the passed X to do the computation you want. See The User Guide for some more details.

How to implement a GridSearchCV custom scorer that is dependent on a training feature?

2 Answers2