2

I would like to code a custom scoring function using the make_scorer function, where my custom_function(y_true, y_pred)calculates the DAILY sumproduct of y_true and y_pred and outputs, say the mean, for example. The problem is that the timestamps are made available in my X matrix as a feature and I cannot access index of current folder.

Does anyone have a clue how to make this possible?

Ben Reiniger
  • 11,770
  • 3
  • 16
  • 56
momobz0
  • 21
  • 1
  • 2

2 Answers2

3

One option is to create a custom score function that calculates the loss and groups by day.

Here is a rough start:

import numpy as np
from sklearn.metrics         import make_scorer
from sklearn.model_selection import GridSearchCV

def custom_loss_function(model, X, y): y_pred = clf.predict(X) y_true = y difference = y_pred-y_true group_timestamp = X[0] # Timestamp column score_by_day = np.array([difference[group_timestamp==i].sum() for i in np.unique(group_timestamp)]) # Groupby score_overall = np.mean(score_by_day) return score_overall

custom_scorer = make_scorer(custom_loss_function, greater_is_better=True)

GridSearchCV(model, param_grid=param_grid, scoring=custom_scorer)

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109
1

You should be able to do this, but without make_scorer.

The "scoring objects" for use in hyperparameter searches in sklearn, as those produced by make_scorer, have signature (estimator, X, y). Compare with metrics/scores/losses, such as those used as input to make_scorer, which have signature (y_true, y_pred).

So the solution is just to define your own "scoring object" directly, and reference the passed X to do the computation you want. See The User Guide for some more details.

Ben Reiniger
  • 11,770
  • 3
  • 16
  • 56